Principle Component Analysis is widely used in data exploration, dimension reduction, data visualization. The aim is to transform original data into uncorrelated linear combinations of the original data while keeping the information contained in the data. High dimensional data tends to show clusters in lower dimensional view.
Clustering Analysis is another form of EDA. Here we are hoping to group data points which are close to each other within the groups and far away between different groups. Clustering using PC’s can be effective. Clustering analysis can be very subjective in the way we need to summarize the properties within each group.
Both PCA and Clustering Analysis are so called unsupervised learning. There is no response variables involved in the process.
For supervised learning, we try to find out how does a set of predictors relate to some response variable of the interest. Multiple regression is still by far, one of the most popular methods. We use a linear models as a working model for its simplicity and interpretability. It is important that we use domain knowledge as much as we can to determine the form of the response as well as the function format of the factors on the other hand.
NLSY79.csvbrca_subtype.csvbrca_x_patient.csvSelf-esteem generally describes a person’s overall sense of self-worthiness and personal value. It can play significant role in one’s motivation and success throughout the life. Factors that influence self-esteem can be inner thinking, health condition, age, life experiences etc. We will try to identify possible factors in our data that are related to the level of self-esteem.
In the well-cited National Longitudinal Study of Youth (NLSY79), it follows about 13,000 individuals and numerous individual-year information has been gathered through surveys. The survey data is open to public here. Among many variables we assembled a subset of variables including personal demographic variables in different years, household environment in 79, ASVAB test Scores in 81 and Self-Esteem scores in 81 and 87 respectively.
The data is store in NLSY79.csv.
Here are the description of variables:
Personal Demographic Variables
Household Environment
Variables Related to ASVAB test Scores in 1981
| Test | Description |
|---|---|
| AFQT | percentile score on the AFQT intelligence test in 1981 |
| Coding | score on the Coding Speed test in 1981 |
| Auto | score on the Automotive and Shop test in 1981 |
| Mechanic | score on the Mechanic test in 1981 |
| Elec | score on the Electronics Information test in 1981 |
| Science | score on the General Science test in 1981 |
| Math | score on the Math test in 1981 |
| Arith | score on the Arithmetic Reasoning test in 1981 |
| Word | score on the Word Knowledge Test in 1981 |
| Parag | score on the Paragraph Comprehension test in 1981 |
| Numer | score on the Numerical Operations test in 1981 |
Self-Esteem test 81 and 87
We have two sets of self-esteem test, one in 1981 and the other in
1987. Each set has same 10 questions. They are labeled as
Esteem81 and Esteem87 respectively followed by
the question number. For example, Esteem81_1 is Esteem
question 1 in 81.
The following 10 questions are answered as 1: strongly agree, 2: agree, 3: disagree, 4: strongly disagree
Load the data. Do a quick EDA to get familiar with the data set. Pay attention to the unit of each variable. Are there any missing values?
## 'data.frame': 2431 obs. of 46 variables:
## $ Subject : int 2 6 7 8 9 13 16 17 18 20 ...
## $ Gender : chr "female" "male" "male" "female" ...
## $ Education05 : int 12 16 12 14 14 16 13 13 13 17 ...
## $ Income87 : int 16000 18000 0 9000 15000 2200 27000 20000 28000 27000 ...
## $ Job05 : chr "4700 TO 4960: Sales and Related Workers" "10 TO 430: Executive, Administrative and Managerial Occupations" "7900 TO 8960: Setters, Operators and Tenders" "5000 TO 5930: Office and Administrative Support Workers" ...
## $ Income05 : int 5500 65000 19000 36000 65000 8000 71000 43000 120000 64000 ...
## $ Weight05 : int 160 187 175 246 180 235 160 188 173 130 ...
## $ HeightFeet05 : int 5 5 5 5 5 6 5 5 5 5 ...
## $ HeightInch05 : int 2 5 9 3 6 0 4 10 9 4 ...
## $ Imagazine : int 1 0 1 1 1 1 1 1 1 1 ...
## $ Inewspaper : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Ilibrary : int 1 1 1 1 1 1 1 1 1 1 ...
## $ MotherEd : int 5 12 12 9 12 12 12 12 12 12 ...
## $ FatherEd : int 8 12 12 6 10 16 12 15 16 18 ...
## $ FamilyIncome78: int 20000 35000 8502 7227 17000 20000 48000 15000 4510 50000 ...
## $ Science : int 6 23 14 18 17 16 13 19 22 21 ...
## $ Arith : int 8 30 14 13 21 30 17 29 30 17 ...
## $ Word : int 15 35 27 35 28 29 30 33 35 28 ...
## $ Parag : int 6 15 8 12 10 13 12 13 14 14 ...
## $ Number : int 29 45 32 24 40 36 49 35 48 39 ...
## $ Coding : int 52 68 35 48 46 30 58 58 61 54 ...
## $ Auto : int 9 21 13 11 13 21 11 18 21 18 ...
## $ Math : int 6 23 11 4 13 24 17 21 23 20 ...
## $ Mechanic : int 10 21 9 12 13 19 11 19 16 20 ...
## $ Elec : int 5 19 11 12 15 16 10 16 17 13 ...
## $ AFQT : num 6.84 99.39 47.41 44.02 59.68 ...
## $ Esteem81_1 : int 1 2 2 1 1 1 2 2 2 1 ...
## $ Esteem81_2 : int 1 1 1 1 1 1 2 2 2 1 ...
## $ Esteem81_3 : int 4 4 3 3 4 4 3 3 3 3 ...
## $ Esteem81_4 : int 1 2 2 2 1 1 2 2 2 1 ...
## $ Esteem81_5 : int 3 4 3 3 1 4 3 3 3 3 ...
## $ Esteem81_6 : int 3 2 2 2 1 1 2 2 2 2 ...
## $ Esteem81_7 : int 1 2 2 3 1 1 3 2 2 1 ...
## $ Esteem81_8 : int 3 4 2 3 4 4 3 3 3 3 ...
## $ Esteem81_9 : int 3 3 3 3 4 4 3 3 3 3 ...
## $ Esteem81_10 : int 3 4 3 3 4 4 3 3 3 3 ...
## $ Esteem87_1 : int 2 1 2 1 1 1 1 2 1 1 ...
## $ Esteem87_2 : int 1 1 2 1 1 1 1 2 1 1 ...
## $ Esteem87_3 : int 4 4 4 3 4 4 4 3 4 4 ...
## $ Esteem87_4 : int 1 1 2 1 1 1 2 2 1 4 ...
## $ Esteem87_5 : int 2 4 4 4 4 4 4 3 4 4 ...
## $ Esteem87_6 : int 2 1 2 2 1 1 2 2 1 1 ...
## $ Esteem87_7 : int 2 2 2 1 1 2 2 2 2 1 ...
## $ Esteem87_8 : int 3 3 4 2 4 4 4 3 4 3 ...
## $ Esteem87_9 : int 3 2 3 2 4 4 3 3 3 4 ...
## $ Esteem87_10 : int 4 4 4 2 4 4 4 3 4 4 ...
## Subject Gender Education05 Income87
## Min. : 2 Length:2431 Min. : 6.0 Min. : -2
## 1st Qu.: 1592 Class :character 1st Qu.:12.0 1st Qu.: 4500
## Median : 3137 Mode :character Median :13.0 Median :12000
## Mean : 3504 Mean :13.9 Mean :13399
## 3rd Qu.: 4668 3rd Qu.:16.0 3rd Qu.:19000
## Max. :12140 Max. :20.0 Max. :59387
## Job05 Income05 Weight05 HeightFeet05
## Length:2431 Min. : 63 Min. : 81 Min. :-4.00
## Class :character 1st Qu.: 22650 1st Qu.:150 1st Qu.: 5.00
## Mode :character Median : 38500 Median :180 Median : 5.00
## Mean : 49415 Mean :183 Mean : 5.18
## 3rd Qu.: 61350 3rd Qu.:209 3rd Qu.: 5.00
## Max. :703637 Max. :380 Max. : 8.00
## HeightInch05 Imagazine Inewspaper Ilibrary MotherEd
## Min. : 0.00 Min. :0.000 Min. :0.000 Min. :0.00 Min. : 0.0
## 1st Qu.: 2.00 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:11.0
## Median : 5.00 Median :1.000 Median :1.000 Median :1.00 Median :12.0
## Mean : 5.32 Mean :0.718 Mean :0.861 Mean :0.77 Mean :11.7
## 3rd Qu.: 8.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:12.0
## Max. :11.00 Max. :1.000 Max. :1.000 Max. :1.00 Max. :20.0
## FatherEd FamilyIncome78 Science Arith Word
## Min. : 0.0 Min. : 0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.:10.0 1st Qu.:11167 1st Qu.:13.0 1st Qu.:13.0 1st Qu.:23.0
## Median :12.0 Median :20000 Median :17.0 Median :19.0 Median :28.0
## Mean :11.8 Mean :21252 Mean :16.3 Mean :18.6 Mean :26.6
## 3rd Qu.:14.0 3rd Qu.:27500 3rd Qu.:20.0 3rd Qu.:25.0 3rd Qu.:32.0
## Max. :20.0 Max. :75001 Max. :25.0 Max. :30.0 Max. :35.0
## Parag Number Coding Auto Math
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.0
## 1st Qu.:10.0 1st Qu.:29.0 1st Qu.:38.0 1st Qu.:10.0 1st Qu.: 9.0
## Median :12.0 Median :36.0 Median :48.0 Median :14.0 Median :14.0
## Mean :11.2 Mean :35.5 Mean :47.1 Mean :14.3 Mean :14.3
## 3rd Qu.:14.0 3rd Qu.:44.0 3rd Qu.:57.0 3rd Qu.:18.0 3rd Qu.:20.0
## Max. :15.0 Max. :50.0 Max. :84.0 Max. :25.0 Max. :25.0
## Mechanic Elec AFQT Esteem81_1 Esteem81_2
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. :1.00 Min. :1.00
## 1st Qu.:11.0 1st Qu.: 9.0 1st Qu.: 31.9 1st Qu.:1.00 1st Qu.:1.00
## Median :14.0 Median :12.0 Median : 57.0 Median :1.00 Median :1.00
## Mean :14.4 Mean :11.6 Mean : 54.7 Mean :1.42 Mean :1.42
## 3rd Qu.:18.0 3rd Qu.:15.0 3rd Qu.: 78.2 3rd Qu.:2.00 3rd Qu.:2.00
## Max. :25.0 Max. :20.0 Max. :100.0 Max. :4.00 Max. :4.00
## Esteem81_3 Esteem81_4 Esteem81_5 Esteem81_6 Esteem81_7
## Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.:3.00 1st Qu.:1.00 1st Qu.:3.00 1st Qu.:1.00 1st Qu.:1.00
## Median :4.00 Median :2.00 Median :4.00 Median :2.00 Median :2.00
## Mean :3.51 Mean :1.57 Mean :3.46 Mean :1.62 Mean :1.75
## 3rd Qu.:4.00 3rd Qu.:2.00 3rd Qu.:4.00 3rd Qu.:2.00 3rd Qu.:2.00
## Max. :4.00 Max. :4.00 Max. :4.00 Max. :4.00 Max. :4.00
## Esteem81_8 Esteem81_9 Esteem81_10 Esteem87_1 Esteem87_2
## Min. :1.00 Min. :1.00 Min. :1.0 Min. :1.00 Min. :1.0
## 1st Qu.:3.00 1st Qu.:3.00 1st Qu.:3.0 1st Qu.:1.00 1st Qu.:1.0
## Median :3.00 Median :3.00 Median :3.0 Median :1.00 Median :1.0
## Mean :3.13 Mean :3.16 Mean :3.4 Mean :1.38 Mean :1.4
## 3rd Qu.:4.00 3rd Qu.:4.00 3rd Qu.:4.0 3rd Qu.:2.00 3rd Qu.:2.0
## Max. :4.00 Max. :4.00 Max. :4.0 Max. :4.00 Max. :4.0
## Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6 Esteem87_7
## Min. :1.00 Min. :1.0 Min. :1.00 Min. :1.00 Min. :1.00
## 1st Qu.:3.00 1st Qu.:1.0 1st Qu.:3.00 1st Qu.:1.00 1st Qu.:1.00
## Median :4.00 Median :1.0 Median :4.00 Median :2.00 Median :2.00
## Mean :3.58 Mean :1.5 Mean :3.53 Mean :1.59 Mean :1.72
## 3rd Qu.:4.00 3rd Qu.:2.0 3rd Qu.:4.00 3rd Qu.:2.00 3rd Qu.:2.00
## Max. :4.00 Max. :4.0 Max. :4.00 Max. :4.00 Max. :4.00
## Esteem87_8 Esteem87_9 Esteem87_10
## Min. :1.0 Min. :1.00 Min. :1.00
## 1st Qu.:3.0 1st Qu.:3.00 1st Qu.:3.00
## Median :3.0 Median :3.00 Median :3.00
## Mean :3.1 Mean :3.06 Mean :3.37
## 3rd Qu.:4.0 3rd Qu.:4.00 3rd Qu.:4.00
## Max. :4.0 Max. :4.00 Max. :4.00
## [1] ""
## [2] "10 TO 430: Executive, Administrative and Managerial Occupations"
## [3] "1000 TO 1240: Mathematical and Computer Scientists"
## [4] "1300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians"
## [5] "1600 TO 1760: Physical Scientists"
## [6] "1800 TO 1860: Social Scientists and Related Workers"
## [7] "1900 TO 1960: Life, Physical and Social Science Technicians"
## [8] "2000 TO 2060: Counselors, Sociala and Religious Workers"
## [9] "2100 TO 2150: Lawyers, Judges and Legal Support Workers"
## [10] "2200 TO 2340: Teachers"
## [11] "2400 TO 2550: Education, Training and Library Workers"
## [12] "2600 TO 2760: Entertainers and Performers, Sports and Related Workers"
## [13] "2800 TO 2960: Media and Communications Workers"
## [14] "3000 TO 3260: Health Diagnosing and Treating Practitioners"
## [15] "3300 TO 3650: Health Care Technical and Support Occupations"
## [16] "3700 TO 3950: Protective Service Occupations"
## [17] "4000 TO 4160: Food Preparation and Serving Related Occupations"
## [18] "4200 TO 4250: Cleaning and Building Service Occupations"
## [19] "4300 TO 4430: Entertainment Attendants and Related Workers"
## [20] "4500 TO 4650: Personal Care and Service Workers"
## [21] "4700 TO 4960: Sales and Related Workers"
## [22] "500 TO 950: Management Related Occupations"
## [23] "5000 TO 5930: Office and Administrative Support Workers"
## [24] "6000 TO 6130: Farming, Fishing and Forestry Occupations"
## [25] "6200 TO 6940: Construction Trade and Extraction Workers"
## [26] "7000 TO 7620: Installation, Maintenance and Repairs Workers"
## [27] "7700 TO 7750: Production and Operating Workers"
## [28] "7800 TO 7850: Food Preparation Occupations"
## [29] "7900 TO 8960: Setters, Operators and Tenders"
## [30] "9000 TO 9750: Transportation and Material Moving Workers"
## [31] "9990: Uncodeable"
##
##
## 56
## 10 TO 430: Executive, Administrative and Managerial Occupations
## 377
## 1000 TO 1240: Mathematical and Computer Scientists
## 64
## 1300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## 53
## 1600 TO 1760: Physical Scientists
## 4
## 1800 TO 1860: Social Scientists and Related Workers
## 6
## 1900 TO 1960: Life, Physical and Social Science Technicians
## 7
## 2000 TO 2060: Counselors, Sociala and Religious Workers
## 41
## 2100 TO 2150: Lawyers, Judges and Legal Support Workers
## 15
## 2200 TO 2340: Teachers
## 120
## 2400 TO 2550: Education, Training and Library Workers
## 29
## 2600 TO 2760: Entertainers and Performers, Sports and Related Workers
## 24
## 2800 TO 2960: Media and Communications Workers
## 13
## 3000 TO 3260: Health Diagnosing and Treating Practitioners
## 74
## 3300 TO 3650: Health Care Technical and Support Occupations
## 99
## 3700 TO 3950: Protective Service Occupations
## 54
## 4000 TO 4160: Food Preparation and Serving Related Occupations
## 68
## 4200 TO 4250: Cleaning and Building Service Occupations
## 67
## 4300 TO 4430: Entertainment Attendants and Related Workers
## 10
## 4500 TO 4650: Personal Care and Service Workers
## 42
## 4700 TO 4960: Sales and Related Workers
## 205
## 500 TO 950: Management Related Occupations
## 108
## 5000 TO 5930: Office and Administrative Support Workers
## 360
## 6000 TO 6130: Farming, Fishing and Forestry Occupations
## 9
## 6200 TO 6940: Construction Trade and Extraction Workers
## 135
## 7000 TO 7620: Installation, Maintenance and Repairs Workers
## 108
## 7700 TO 7750: Production and Operating Workers
## 49
## 7800 TO 7850: Food Preparation Occupations
## 4
## 7900 TO 8960: Setters, Operators and Tenders
## 112
## 9000 TO 9750: Transportation and Material Moving Workers
## 117
## 9990: Uncodeable
## 1
## [1] "Subject" "Gender" "Education05" "Income87"
## [5] "Job05" "Income05" "Weight05" "HeightFeet05"
## [9] "HeightInch05" "Imagazine" "Inewspaper" "Ilibrary"
## [13] "MotherEd" "FatherEd" "FamilyIncome78" "Science"
## [17] "Arith" "Word" "Parag" "Number"
## [21] "Coding" "Auto" "Math" "Mechanic"
## [25] "Elec" "AFQT" "Esteem81_1" "Esteem81_2"
## [29] "Esteem81_3" "Esteem81_4" "Esteem81_5" "Esteem81_6"
## [33] "Esteem81_7" "Esteem81_8" "Esteem81_9" "Esteem81_10"
## [37] "Esteem87_1" "Esteem87_2" "Esteem87_3" "Esteem87_4"
## [41] "Esteem87_5" "Esteem87_6" "Esteem87_7" "Esteem87_8"
## [45] "Esteem87_9" "Esteem87_10"
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
paired visualization of income in 87 and income in 05
Family income to 87 and 2005 income + gender + weight
How many people believe they are a person of worth in 87, and how is this affected by gender and income in 05?
by parents reading the newspaper
feeling useless: 1: strongly agree, 2: agree, 3: disagree, 4:
strongly disagree
Of those who agreed with feeling useless in 87, what was their
income in 87 and weight
## nas blanks
## 1 0 56
From looking at the jobs barplots, it’s posisble those with missing values could come from missing job types. These people may be unemployed, which is different than having a job that was “uncodeable”. Therefore, I’m deciding not to remove thise entries as it could still be informative given there are several individuals who report income as 0
Let concentrate on Esteem scores evaluated in 87.
The following 10 questions are answered as 1: strongly agree, 2: agree, 3: disagree, 4: strongly disagree
Esteem variables.
Pay attention to missing values, any peculiar numbers etc. How do you
fix problems discovered if there is any? Briefly describe what you have
done for the data preparation.## nas blanks
## 1 0 0
There appear to be no blanks or NAs
Reverse Esteem 1, 2, 4, 6, and 7 so that a higher score
corresponds to higher self-esteem. (Hint: if we store the esteem data in
data.esteem, then
data.esteem[, c(1, 2, 4, 6, 7)] <- 5 - data.esteem[, c(1, 2, 4, 6, 7)]
to reverse the score.)
Write a brief summary with necessary plots about the 10 esteem measurements.
Are esteem scores all positively correlated? Report the pairwise correlation table and write a brief summary.
PCA on 10 esteem measurements. (centered but no scaling)
## Esteem87_1 Esteem87_2 Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6
## 0 0 0 0 0 0
## Esteem87_7 Esteem87_8 Esteem87_9 Esteem87_10
## 0 0 0 0
## Esteem87_1 Esteem87_2 Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6
## 3.62 3.60 3.58 3.50 3.53 3.41
## Esteem87_7 Esteem87_8 Esteem87_9 Esteem87_10
## 3.28 3.10 3.06 3.37
## Esteem87_1 Esteem87_2 Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6
## 0.499 0.501 0.542 0.533 0.600 0.562
## Esteem87_7 Esteem87_8 Esteem87_9 Esteem87_10
## 0.582 0.738 0.740 0.653
## Esteem87_1 Esteem87_2 Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6
## 0.499 0.501 0.542 0.533 0.600 0.562
## Esteem87_7 Esteem87_8 Esteem87_9 Esteem87_10
## 0.582 0.738 0.740 0.653
Here we see the new mean centered at 0 but the standard deviation is the same as the uncentered data
## Standard deviations (1, .., p=10):
## [1] 1.297 0.678 0.572 0.520 0.461 0.433 0.375 0.367 0.350 0.271
##
## Rotation (n x k) = (10 x 10):
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Esteem87_1 0.235 -0.374 0.0569 -0.0124 0.3933 -0.00997 0.20077 -0.35223
## Esteem87_2 0.244 -0.367 0.0618 0.0280 0.3775 -0.00899 0.15108 -0.34216
## Esteem87_3 0.279 -0.149 0.1208 -0.4363 -0.0784 -0.01155 0.59518 0.56802
## Esteem87_4 0.261 -0.321 0.0669 0.1349 0.2635 -0.12372 -0.60850 0.48811
## Esteem87_5 0.312 -0.131 0.0610 -0.5921 -0.3611 0.40370 -0.40708 -0.25747
## Esteem87_6 0.313 -0.209 -0.0707 0.3569 -0.2480 -0.03221 -0.03448 0.22521
## Esteem87_7 0.299 -0.163 -0.1221 0.4974 -0.5116 0.20487 0.20311 -0.15285
## Esteem87_8 0.393 0.332 -0.8212 -0.1159 0.2114 -0.04713 -0.00257 0.00553
## Esteem87_9 0.398 0.578 0.4418 0.2114 0.2824 0.42803 0.02333 0.05408
## Esteem87_10 0.376 0.260 0.2843 -0.0837 -0.2217 -0.77004 -0.06034 -0.23373
## PC9 PC10
## Esteem87_1 -0.06564 -0.69120
## Esteem87_2 -0.00632 0.72013
## Esteem87_3 0.10496 0.02271
## Esteem87_4 0.33400 -0.04001
## Esteem87_5 -0.07386 0.00175
## Esteem87_6 -0.78175 0.01641
## Esteem87_7 0.50187 -0.03359
## Esteem87_8 0.02638 0.00319
## Esteem87_9 -0.03174 -0.00839
## Esteem87_10 0.05393 -0.00694
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Standard deviation 1.297 0.678 0.5717 0.520 0.4610 0.4330 0.375 0.3668
## Proportion of Variance 0.466 0.127 0.0905 0.075 0.0588 0.0519 0.039 0.0373
## Cumulative Proportion 0.466 0.593 0.6837 0.759 0.8175 0.8694 0.908 0.9457
## PC9 PC10
## Standard deviation 0.3499 0.2713
## Proportion of Variance 0.0339 0.0204
## Cumulative Proportion 0.9796 1.0000
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Esteem87_1 0.235 -0.374 0.0569 -0.0124 0.3933 -0.00997 0.20077 -0.35223
## Esteem87_2 0.244 -0.367 0.0618 0.0280 0.3775 -0.00899 0.15108 -0.34216
## Esteem87_3 0.279 -0.149 0.1208 -0.4363 -0.0784 -0.01155 0.59518 0.56802
## Esteem87_4 0.261 -0.321 0.0669 0.1349 0.2635 -0.12372 -0.60850 0.48811
## Esteem87_5 0.312 -0.131 0.0610 -0.5921 -0.3611 0.40370 -0.40708 -0.25747
## Esteem87_6 0.313 -0.209 -0.0707 0.3569 -0.2480 -0.03221 -0.03448 0.22521
## Esteem87_7 0.299 -0.163 -0.1221 0.4974 -0.5116 0.20487 0.20311 -0.15285
## Esteem87_8 0.393 0.332 -0.8212 -0.1159 0.2114 -0.04713 -0.00257 0.00553
## Esteem87_9 0.398 0.578 0.4418 0.2114 0.2824 0.42803 0.02333 0.05408
## Esteem87_10 0.376 0.260 0.2843 -0.0837 -0.2217 -0.77004 -0.06034 -0.23373
## PC9 PC10
## Esteem87_1 -0.06564 -0.69120
## Esteem87_2 -0.00632 0.72013
## Esteem87_3 0.10496 0.02271
## Esteem87_4 0.33400 -0.04001
## Esteem87_5 -0.07386 0.00175
## Esteem87_6 -0.78175 0.01641
## Esteem87_7 0.50187 -0.03359
## Esteem87_8 0.02638 0.00319
## Esteem87_9 -0.03174 -0.00839
## Esteem87_10 0.05393 -0.00694
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Esteem87_1 0.235 -0.374 0.0569 -0.0124 0.3933 -0.00997 0.20077 -0.35223
## Esteem87_2 0.244 -0.367 0.0618 0.0280 0.3775 -0.00899 0.15108 -0.34216
## Esteem87_3 0.279 -0.149 0.1208 -0.4363 -0.0784 -0.01155 0.59518 0.56802
## Esteem87_4 0.261 -0.321 0.0669 0.1349 0.2635 -0.12372 -0.60850 0.48811
## Esteem87_5 0.312 -0.131 0.0610 -0.5921 -0.3611 0.40370 -0.40708 -0.25747
## Esteem87_6 0.313 -0.209 -0.0707 0.3569 -0.2480 -0.03221 -0.03448 0.22521
## Esteem87_7 0.299 -0.163 -0.1221 0.4974 -0.5116 0.20487 0.20311 -0.15285
## Esteem87_8 0.393 0.332 -0.8212 -0.1159 0.2114 -0.04713 -0.00257 0.00553
## Esteem87_9 0.398 0.578 0.4418 0.2114 0.2824 0.42803 0.02333 0.05408
## Esteem87_10 0.376 0.260 0.2843 -0.0837 -0.2217 -0.77004 -0.06034 -0.23373
## PC9 PC10
## Esteem87_1 -0.06564 -0.69120
## Esteem87_2 -0.00632 0.72013
## Esteem87_3 0.10496 0.02271
## Esteem87_4 0.33400 -0.04001
## Esteem87_5 -0.07386 0.00175
## Esteem87_6 -0.78175 0.01641
## Esteem87_7 0.50187 -0.03359
## Esteem87_8 0.02638 0.00319
## Esteem87_9 -0.03174 -0.00839
## Esteem87_10 0.05393 -0.00694
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
## PC1 1 0 0 0 0 0 0 0 0 0
## PC2 0 1 0 0 0 0 0 0 0 0
## PC3 0 0 1 0 0 0 0 0 0 0
## PC4 0 0 0 1 0 0 0 0 0 0
## PC5 0 0 0 0 1 0 0 0 0 0
## PC6 0 0 0 0 0 1 0 0 0 0
## PC7 0 0 0 0 0 0 1 0 0 0
## PC8 0 0 0 0 0 0 0 1 0 0
## PC9 0 0 0 0 0 0 0 0 1 0
## PC10 0 0 0 0 0 0 0 0 0 1
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
## 1 1 1 1 1 1 1 1 1 1
All loadings are perpendicular and with unit 1
b) Are there good interpretations for PC1 and PC2? (If loadings are all negative, take the positive loadings for the ease of interpretation)
Loadings determine the contribution of each variable to the
PCs.
PC1 loadings are all positively correlated with the esteem features,
given their positive loadings. PC2 had loadings which are mostly
negatively correlated with the esteem features, besides the final
three.
These loadings could indicate that the 6 loadings in PC1 with values over 0.300 contribute more to PC1. Also Esteem87_9’s loading in PC2 is one of three positively correlated with PC2 and also has the strongest contribution to the PC.
c) How is the PC1 score obtained for each subject? Write down the formula.
PC1 = 0.235 x Esteem87_1 + 0.244 x Esteem87_2 + 0.279 x Esteem87_3 + 0.261 x Esteem87_4 + 0.312 x Esteem87_5 + 0.313 x Esteem87_6 + 0.299 x Esteem87_7 + 0.393 x Esteem87_8 + 0.398 x Esteem87_9 + 0.376 x Esteem87_10
d) Are PC1 scores and PC2 scores in the data uncorrelated?
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
## PC1 1.68 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
## PC2 0.00 0.459 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
## PC3 0.00 0.000 0.327 0.000 0.000 0.000 0.000 0.000 0.000 0.0000
## PC4 0.00 0.000 0.000 0.271 0.000 0.000 0.000 0.000 0.000 0.0000
## PC5 0.00 0.000 0.000 0.000 0.212 0.000 0.000 0.000 0.000 0.0000
## PC6 0.00 0.000 0.000 0.000 0.000 0.187 0.000 0.000 0.000 0.0000
## PC7 0.00 0.000 0.000 0.000 0.000 0.000 0.141 0.000 0.000 0.0000
## PC8 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.135 0.000 0.0000
## PC9 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.122 0.0000
## PC10 0.00 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.0736
There appears to be no correlation between the PCs
e) Plot PVE (Proportion of Variance Explained) and summarize the plot.
## [1] 1.68 0.46 0.33 0.27 0.21 0.19 0.14 0.13 0.12 0.07
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## Standard deviation 1.297 0.678 0.5717 0.520 0.4610 0.4330 0.375 0.3668
## Proportion of Variance 0.466 0.127 0.0905 0.075 0.0588 0.0519 0.039 0.0373
## Cumulative Proportion 0.466 0.593 0.6837 0.759 0.8175 0.8694 0.908 0.9457
## PC9 PC10
## Standard deviation 0.3499 0.2713
## Proportion of Variance 0.0339 0.0204
## Cumulative Proportion 0.9796 1.0000
PC1 captures a large percetnage of the variance and each subsequent princial component explaines lessa variance than the preceeding one.
PC1 captures about 40% of the variance and PC2 about 10%
f) Also plot CPVE (Cumulative Proportion of Variance Explained). What proportion of the variance in the data is explained by the first two principal components?
Cumulatively, about 60% of the variance in this data is explained by the first 2 PCs
g) PC’s provide us with a low dimensional view of the self-esteem scores. Use a biplot with the first two PC's to display the data. Give an interpretation of PC1 and PC2 from the plot. (try `ggbiplot` if you could, much prettier!)
## Loading required package: usethis
## Skipping install of 'factoextra' from a github remote, the SHA1 (1689fc74) has not changed since last install.
## Use `force = TRUE` to force installation
## Skipping install of 'ggfortify' from a github remote, the SHA1 (58e02668) has not changed since last install.
## Use `force = TRUE` to force installation
This biplot shows that the PC1 loadings are similarin magnitudes and signs. It also shows that PC2 shoudl capture the differences in responses to questions 8, 9 and 10 from the other questions. This also means questions 8, 9, and 10 are more correlated than the other questions (features) .
Apply k-means to cluster subjects on the original esteem scores
b) Can you summarize common features within each cluster?
## K-means clustering with 2 clusters of sizes 1264, 1167
##
## Cluster means:
## Esteem87_1 Esteem87_2 Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6 Esteem87_7
## 1 3.37 3.33 3.28 3.20 3.18 3.06 2.96
## 2 3.89 3.89 3.90 3.82 3.90 3.78 3.63
## Esteem87_8 Esteem87_9 Esteem87_10
## 1 2.70 2.67 2.97
## 2 3.53 3.49 3.80
##
## Clustering vector:
## [1] 1 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 2 1 1 1 1 1 1 2 2 2 2 1 2 1 2 1 2 1 2 2 2
## [38] 1 2 2 2 2 2 1 2 2 1 1 2 2 2 2 2 1 2 1 1 1 2 2 2 2 1 2 2 1 1 1 2 1 1 2 2 1
## [75] 1 2 2 1 1 2 1 2 2 1 2 1 2 2 2 1 1 1 1 1 1 2 2 2 1 2 2 2 2 2 1 2 2 1 2 2 1
## [112] 2 2 2 2 1 2 2 2 1 2 2 2 2 1 2 2 2 2 1 2 1 2 2 1 1 1 2 2 2 1 1 1 2 2 2 2 1
## [149] 1 1 2 2 1 2 1 2 2 1 1 1 1 2 1 2 2 2 2 2 2 2 1 2 2 1 1 2 1 1 1 2 2 1 2 2 2
## [186] 1 2 2 2 2 2 2 1 2 1 1 2 1 2 1 2 2 2 1 1 2 2 2 1 2 2 1 1 2 1 2 1 1 1 2 2 1
## [223] 1 2 1 2 1 1 2 2 1 2 2 2 2 2 2 2 2 1 2 2 1 1 2 1 1 1 2 2 1 1 1 2 2 2 1 2 1
## [260] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 2 1 2 2 2 1 2 1 2 2 2 2 2 1 1 1
## [297] 1 2 1 1 2 1 1 1 1 1 2 1 2 1 1 1 2 2 2 2 2 2 1 1 1 2 2 2 1 1 1 2 1 1 2 2 1
## [334] 2 1 1 2 2 2 1 1 2 1 2 1 1 1 1 2 2 2 1 1 2 1 1 2 1 1 2 1 1 2 2 2 1 2 2 2 1
## [371] 1 1 1 1 2 1 1 1 2 1 1 2 2 1 2 1 1 2 2 2 1 2 1 1 1 2 2 2 1 1 1 2 1 1 1 1 2
## [408] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 2 2 1 2 2 1 1 1 2 2 2 1 1 1 2 1 1 1 2 1 1
## [445] 2 2 2 2 2 1 2 2 2 2 1 1 1 1 2 2 2 2 2 1 1 2 1 1 2 1 2 2 2 1 1 1 1 1 1 2 1
## [482] 1 1 1 2 1 2 1 2 2 1 2 1 1 1 1 1 2 2 1 1 2 2 1 1 2 2 2 1 2 1 1 1 1 2 1 1 2
## [519] 2 2 1 1 1 1 1 2 2 1 1 2 2 1 2 2 1 1 1 1 1 1 2 2 1 2 2 1 1 2 2 1 1 1 1 1 2
## [556] 2 1 2 1 2 2 1 1 1 2 2 1 1 1 2 2 1 2 1 2 1 1 2 1 2 1 1 1 2 1 1 2 1 2 2 1 1
## [593] 1 1 1 2 1 2 2 2 1 1 2 2 2 1 1 2 1 1 1 2 2 1 2 1 2 2 1 2 2 2 2 1 1 1 2 1 1
## [630] 1 2 1 2 1 1 2 1 2 2 1 1 1 2 2 2 2 1 2 2 2 1 1 2 2 1 1 1 1 1 2 2 1 1 2 2 2
## [667] 2 2 1 1 2 1 2 2 2 2 1 2 1 2 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 1 1 1 1 2 1 2 2
## [704] 1 1 2 2 1 2 2 1 1 1 1 2 1 1 2 2 1 2 1 1 2 1 2 2 2 1 1 2 2 2 1 2 2 1 2 2 1
## [741] 2 1 1 1 2 1 2 2 1 2 2 1 2 1 2 1 2 1 1 1 2 1 1 1 2 1 1 1 2 2 2 1 1 2 1 2 2
## [778] 1 2 2 1 1 1 1 1 2 1 2 1 1 1 2 1 2 1 1 2 1 1 2 1 1 1 2 2 2 1 1 1 1 2 1 2 2
## [815] 2 2 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 1 1 2 1 2 1 1 1 1 1 1 1 2 2 2 1 1 1 2 2
## [852] 1 2 1 2 1 2 2 1 1 2 2 2 2 1 2 1 1 2 2 2 1 2 1 1 1 1 2 1 2 2 1 1 2 1 1 2 2
## [889] 1 1 2 1 1 2 1 2 2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 2 2 2 2 1 1 2 2 1 1 2 1 1 2
## [926] 1 2 1 1 1 2 2 2 1 2 1 2 2 1 2 2 2 1 1 2 2 2 2 1 1 2 1 2 1 1 1 2 1 1 1 1 2
## [963] 1 2 1 1 1 1 1 2 2 2 1 1 1 1 1 1 1 2 1 1 2 2 1 1 2 1 2 1 2 1 2 2 1 2 1 2 2
## [1000] 2 1 1 1 1 1 2 2 1 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 1 1 1 2 2 1 2 1 1 1 1 1 1
## [1037] 1 1 2 2 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 2 2 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 2
## [1074] 2 2 1 1 1 1 1 2 2 2 1 1 1 2 2 1 2 1 1 1 2 2 2 1 2 2 2 1 1 1 1 2 1 1 2 2 2
## [1111] 2 1 1 2 2 2 2 1 2 1 2 2 2 2 1 2 1 2 2 1 2 1 2 1 1 1 1 2 2 2 2 1 1 2 1 1 2
## [1148] 1 2 2 2 2 1 1 1 1 1 1 1 1 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 1 1 1 2 2 1 2 2
## [1185] 1 1 2 1 1 2 2 1 1 1 2 1 1 1 1 2 2 1 1 2 1 2 2 1 2 2 2 2 1 1 1 2 1 1 1 2 2
## [1222] 2 2 1 1 2 2 1 1 2 2 1 2 2 1 1 1 2 1 2 1 1 1 2 2 2 2 1 2 1 1 1 1 2 2 2 1 1
## [1259] 1 2 2 2 1 2 2 2 2 1 1 1 2 1 2 2 1 1 2 2 1 1 2 1 2 1 2 1 1 1 2 1 2 1 1 2 2
## [1296] 1 2 1 1 1 1 1 2 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 2 2 1 2 2 2 1 2 2 1 1 2 1 1
## [1333] 2 1 1 2 2 1 1 1 2 1 2 1 1 1 1 2 2 2 2 2 1 2 2 1 1 2 2 2 1 1 1 2 2 2 1 1 1
## [1370] 1 2 2 2 2 1 2 1 1 2 2 2 2 1 1 1 1 1 2 2 2 2 2 1 2 2 1 2 2 2 1 2 1 1 1 1 1
## [1407] 1 1 1 1 1 1 2 2 1 2 2 2 1 1 1 1 1 1 1 2 1 2 2 1 2 1 1 2 1 2 2 2 2 2 2 1 2
## [1444] 1 1 1 1 1 1 2 1 2 1 1 1 2 2 1 1 2 1 2 2 2 1 1 2 1 1 1 1 2 2 1 2 1 1 1 2 1
## [1481] 1 2 1 2 2 1 1 2 1 2 2 1 1 2 1 2 2 1 1 1 2 2 2 1 2 2 2 2 1 2 2 2 1 1 1 1 2
## [1518] 1 1 2 1 1 2 2 2 1 1 2 2 2 1 2 2 1 1 1 2 2 1 2 1 1 2 1 2 2 1 1 1 2 1 2 1 2
## [1555] 2 1 2 1 2 1 1 1 2 2 2 1 2 2 2 2 1 2 1 2 2 1 2 2 1 2 2 1 1 2 2 1 1 1 1 1 1
## [1592] 1 2 2 1 1 2 2 1 1 2 1 1 1 1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 2 1 1 2 2 1 2 2 2
## [1629] 2 1 2 2 1 2 2 1 1 1 2 1 1 1 2 2 1 2 2 2 2 1 1 2 1 1 1 1 1 2 1 1 1 2 1 2 2
## [1666] 1 2 1 1 2 2 1 1 2 1 2 2 1 1 2 1 2 1 2 2 1 1 1 1 2 1 2 1 2 2 1 2 1 1 1 2 1
## [1703] 2 1 2 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 1 2 2 1 2 1 2 1 1 2 1 2 1 1 2 2 1
## [1740] 2 1 2 2 2 2 2 2 2 1 1 2 2 2 2 1 2 1 1 1 2 1 2 2 1 1 1 2 1 2 2 1 1 1 2 1 1
## [1777] 1 1 2 1 1 1 2 1 2 2 1 2 1 1 1 1 1 2 1 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2
## [1814] 2 1 2 1 1 2 1 2 1 2 1 1 1 1 1 1 2 2 1 2 2 2 2 2 2 1 1 1 2 1 2 2 1 1 1 1 2
## [1851] 1 1 2 1 1 1 1 1 1 1 1 1 2 1 2 2 2 1 1 2 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1
## [1888] 2 2 2 1 1 2 1 2 2 2 1 2 2 1 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 2 1 1 2 1 1 2 2
## [1925] 1 2 2 2 2 1 2 1 1 2 2 2 1 2 1 1 2 1 2 1 1 2 2 1 2 1 2 1 1 1 1 2 2 2 1 1 1
## [1962] 1 1 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 2 1 2 1 1 2 2 2 2 1 1 1 1 1 1 2 1 1 1 1
## [1999] 1 1 1 2 2 2 2 2 2 1 2 1 1 2 1 1 2 1 2 1 2 2 2 1 2 1 2 1 1 1 2 1 2 1 1 1 1
## [2036] 2 2 1 1 1 2 1 1 1 2 2 1 2 2 1 2 2 1 2 2 2 1 2 2 1 2 1 2 1 1 1 2 1 1 2 1 2
## [2073] 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 1 2 1
## [2110] 1 2 2 2 1 1 1 1 2 2 1 2 1 2 1 2 1 2 1 2 2 1 2 1 1 2 1 2 2 1 1 2 1 1 1 1 2
## [2147] 1 1 2 1 2 1 1 2 2 1 2 2 2 1 1 2 2 2 1 1 1 1 2 1 1 2 2 2 2 1 1 1 1 1 2 2 1
## [2184] 1 1 1 2 2 2 1 1 1 2 1 1 1 1 1 2 1 2 2 2 1 2 1 1 1 1 2 1 1 1 1 1 2 2 1 1 2
## [2221] 1 1 2 2 1 1 1 1 1 1 2 1 1 2 1 2 1 2 2 1 2 2 2 1 2 2 2 2 1 2 1 1 2 2 2 2 1
## [2258] 2 1 1 1 2 2 1 1 1 2 2 2 1 1 1 1 2 2 2 2 2 1 2 2 1 2 1 2 2 2 1 2 2 2 1 2 1
## [2295] 2 1 1 1 2 1 2 1 1 2 2 2 2 2 2 2 1 1 2 1 1 2 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2
## [2332] 1 2 2 1 1 2 1 1 1 2 1 1 1 1 2 1 2 2 1 2 2 2 2 2 1 1 2 2 2 2 1 1 1 1 1 1 2
## [2369] 1 2 2 1 2 2 1 1 2 1 1 1 2 1 1 1 2 2 2 2 1 2 2 2 1 2 2 2 1 1 1 1 2 1 1 1 1
## [2406] 1 2 1 2 2 1 1 1 2 2 1 1 2 2 2 2 1 1 2 2 2 2 1 2 2 1
##
## Within cluster sum of squares by cluster:
## [1] 3353 2446
## (between_SS / total_SS = 33.9 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss" "tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"
## [1] 1 2 2 1
## [1] 1264 1167
this shows that those with the highest self esteem are mostly in
cluster 1 while others are in cluster 2. These are people who agree with
a positive statement and disagree with a negative stateent.
c) Can you visualize the clusters with somewhat clear boundaries? You may try different pairs of variables and different PC pairs of the esteem scores.
We now try to find out what factors are related to self-esteem? PC1 of all the Esteem scores is a good variable to summarize one’s esteem scores. We take PC1 as our response variable.
EDA the data set first.
Personal information: gender, education (05), log(income) in 87, job type in 87. Weight05 (lb) and HeightFeet05 together with Heightinch05. One way to summarize one’s weight and height is via Body Mass Index which is defined as the body mass divided by the square of the body height, and is universally expressed in units of kg/m². Note, you need to create BMI first. Then may include it as one possible predictor.
## Subject Gender Education05 Income87
## 1 2 female 12 16000
## 2 6 male 16 18000
## 3 7 male 12 0
## 4 8 female 14 9000
## 5 9 male 14 15000
## 6 13 male 16 2200
## Job05 Income05
## 1 4700 TO 4960: Sales and Related Workers 5500
## 2 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 3 7900 TO 8960: Setters, Operators and Tenders 19000
## 4 5000 TO 5930: Office and Administrative Support Workers 36000
## 5 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 6 4200 TO 4250: Cleaning and Building Service Occupations 8000
## Weight05 HeightFeet05 HeightInch05 Imagazine Inewspaper Ilibrary MotherEd
## 1 160 5 2 1 1 1 5
## 2 187 5 5 0 1 1 12
## 3 175 5 9 1 1 1 12
## 4 246 5 3 1 1 1 9
## 5 180 5 6 1 1 1 12
## 6 235 6 0 1 1 1 12
## FatherEd FamilyIncome78 Science Arith Word Parag Number Coding Auto Math
## 1 8 20000 6 8 15 6 29 52 9 6
## 2 12 35000 23 30 35 15 45 68 21 23
## 3 12 8502 14 14 27 8 32 35 13 11
## 4 6 7227 18 13 35 12 24 48 11 4
## 5 10 17000 17 21 28 10 40 46 13 13
## 6 16 20000 16 30 29 13 36 30 21 24
## Mechanic Elec AFQT Esteem81_1 Esteem81_2 Esteem81_3 Esteem81_4 Esteem81_5
## 1 10 5 6.84 1 1 4 1 3
## 2 21 19 99.39 2 1 4 2 4
## 3 9 11 47.41 2 1 3 2 3
## 4 12 12 44.02 1 1 3 2 3
## 5 13 15 59.68 1 1 4 1 1
## 6 19 16 72.31 1 1 4 1 4
## Esteem81_6 Esteem81_7 Esteem81_8 Esteem81_9 Esteem81_10 Esteem87_1 Esteem87_2
## 1 3 1 3 3 3 2 1
## 2 2 2 4 3 4 1 1
## 3 2 2 2 3 3 2 2
## 4 2 3 3 3 3 1 1
## 5 1 1 4 4 4 1 1
## 6 1 1 4 4 4 1 1
## Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6 Esteem87_7 Esteem87_8 Esteem87_9
## 1 4 1 2 2 2 3 3
## 2 4 1 4 1 2 3 2
## 3 4 2 4 2 2 4 3
## 4 3 1 4 2 1 2 2
## 5 4 1 4 1 1 4 4
## 6 4 1 4 1 2 4 4
## Esteem87_10 PC1.esteem ft2m in2m totalheight kg bmi
## 1 4 -0.314 1.52 0.127 1.65 72.6 26.6
## 2 4 0.461 1.52 0.127 1.65 84.8 31.1
## 3 4 0.199 1.52 0.127 1.65 79.4 29.1
## 4 2 -0.976 1.52 0.127 1.65 111.6 40.9
## 5 4 1.949 1.52 0.127 1.65 81.6 30.0
## 6 4 1.650 1.83 0.152 1.98 106.6 27.2
- Household environment: Imagazine, Inewspaper, Ilibrary, MotherEd, FatherEd, FamilyIncome78. Do set indicators `Imagazine`, `Inewspaper` and `Ilibrary` as factors.
## Subject Gender Education05 Income87
## 1 2 female 12 16000
## 2 6 male 16 18000
## 3 7 male 12 0
## 4 8 female 14 9000
## 5 9 male 14 15000
## 6 13 male 16 2200
## Job05 Income05
## 1 4700 TO 4960: Sales and Related Workers 5500
## 2 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 3 7900 TO 8960: Setters, Operators and Tenders 19000
## 4 5000 TO 5930: Office and Administrative Support Workers 36000
## 5 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 6 4200 TO 4250: Cleaning and Building Service Occupations 8000
## Weight05 HeightFeet05 HeightInch05 Imagazine Inewspaper Ilibrary MotherEd
## 1 160 5 2 1 1 1 5
## 2 187 5 5 0 1 1 12
## 3 175 5 9 1 1 1 12
## 4 246 5 3 1 1 1 9
## 5 180 5 6 1 1 1 12
## 6 235 6 0 1 1 1 12
## FatherEd FamilyIncome78 Science Arith Word Parag Number Coding Auto Math
## 1 8 20000 6 8 15 6 29 52 9 6
## 2 12 35000 23 30 35 15 45 68 21 23
## 3 12 8502 14 14 27 8 32 35 13 11
## 4 6 7227 18 13 35 12 24 48 11 4
## 5 10 17000 17 21 28 10 40 46 13 13
## 6 16 20000 16 30 29 13 36 30 21 24
## Mechanic Elec AFQT Esteem81_1 Esteem81_2 Esteem81_3 Esteem81_4 Esteem81_5
## 1 10 5 6.84 1 1 4 1 3
## 2 21 19 99.39 2 1 4 2 4
## 3 9 11 47.41 2 1 3 2 3
## 4 12 12 44.02 1 1 3 2 3
## 5 13 15 59.68 1 1 4 1 1
## 6 19 16 72.31 1 1 4 1 4
## Esteem81_6 Esteem81_7 Esteem81_8 Esteem81_9 Esteem81_10 Esteem87_1 Esteem87_2
## 1 3 1 3 3 3 2 1
## 2 2 2 4 3 4 1 1
## 3 2 2 2 3 3 2 2
## 4 2 3 3 3 3 1 1
## 5 1 1 4 4 4 1 1
## 6 1 1 4 4 4 1 1
## Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6 Esteem87_7 Esteem87_8 Esteem87_9
## 1 4 1 2 2 2 3 3
## 2 4 1 4 1 2 3 2
## 3 4 2 4 2 2 4 3
## 4 3 1 4 2 1 2 2
## 5 4 1 4 1 1 4 4
## 6 4 1 4 1 2 4 4
## Esteem87_10 PC1.esteem ft2m in2m totalheight kg bmi
## 1 4 -0.314 1.52 0.127 1.65 72.6 26.6
## 2 4 0.461 1.52 0.127 1.65 84.8 31.1
## 3 4 0.199 1.52 0.127 1.65 79.4 29.1
## 4 2 -0.976 1.52 0.127 1.65 111.6 40.9
## 5 4 1.949 1.52 0.127 1.65 81.6 30.0
## 6 4 1.650 1.83 0.152 1.98 106.6 27.2
- You may use PC1 of ASVAB as level of intelligence
**Variables Related to ASVAB test Scores in 1981**
| Test | Description |
|---|---|
| AFQT | percentile score on the AFQT intelligence test in 1981 |
| Coding | score on the Coding Speed test in 1981 |
| Auto | score on the Automotive and Shop test in 1981 |
| Mechanic | score on the Mechanic test in 1981 |
| Elec | score on the Electronics Information test in 1981 |
| Science | score on the General Science test in 1981 |
| Math | score on the Math test in 1981 |
| Arith | score on the Arithmetic Reasoning test in 1981 |
| Word | score on the Word Knowledge Test in 1981 |
| Parag | score on the Paragraph Comprehension test in 1981 |
| Numer | score on the Numerical Operations test in 1981 |
## [1] "sdev" "rotation" "center" "scale" "x"
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## AFQT 0.419 -0.1593 0.5657 -0.00548 0.2262 0.0339 -0.6534
## Coding 0.267 -0.6096 -0.4191 0.61556 0.0374 0.0103 -0.0351
## Auto 0.367 0.3977 -0.5317 -0.14377 -0.2040 0.4771 -0.3666
## Mechanic 0.407 0.2632 -0.1639 -0.05398 0.7819 -0.1529 0.3177
## Elec 0.413 0.2701 -0.0264 0.10900 -0.4416 -0.7409 -0.0166
## Science 0.430 0.0825 0.4259 0.22643 -0.2821 0.4434 0.5474
## Number 0.314 -0.5440 -0.1125 -0.73097 -0.1410 -0.0472 0.1911
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## [1,] -3.145 -0.810 -1.2495 0.201 0.5542 -0.1280 -0.0721
## [2,] 3.678 -0.161 -0.1021 0.411 -0.1639 -0.2839 -0.2384
## [3,] -1.214 0.257 0.3210 -0.268 -0.6247 -0.0569 -0.3776
## [4,] -0.724 0.327 0.4378 1.074 -0.3170 -0.0932 0.3009
## [5,] 0.404 -0.162 0.2939 -0.198 -0.6050 -0.6364 0.0358
## [6,] 1.232 1.577 -0.0519 -0.858 0.0782 -0.3488 -0.5978
## AFQT Coding Auto Mechanic Elec Science Number PC1 PC2 PC3 PC4
## 1 1.61 1.308 1.27 2.07 1.81 1.83 1.438 4.31 -0.149 -0.0804 0.0620
## 2 1.54 0.390 1.83 2.07 2.05 1.83 1.240 4.28 0.821 -0.0226 -0.4128
## 3 1.56 0.718 1.83 1.68 2.05 1.83 1.339 4.25 0.460 -0.0925 -0.2622
## 4 1.32 1.308 1.64 1.88 2.05 1.83 0.944 4.19 0.331 -0.3672 0.4082
## 5 1.27 0.587 2.02 2.07 2.05 1.83 0.845 4.17 1.034 -0.3118 -0.0281
## 6 1.30 1.899 1.83 2.07 1.56 1.20 1.339 4.15 -0.299 -1.0540 0.2493
## PC5 PC6 PC7
## 1 0.2583 -0.2418 0.3372
## 2 0.0118 -0.1563 0.1704
## 3 -0.2905 -0.0968 0.0366
## 4 -0.0769 -0.2001 0.2328
## 5 -0.0239 -0.0549 0.1936
## 6 0.3942 -0.0704 -0.0462
## Subject Gender Education05 Income87
## 1 2 female 12 16000
## 2 6 male 16 18000
## 3 7 male 12 0
## 4 8 female 14 9000
## 5 9 male 14 15000
## 6 13 male 16 2200
## Job05 Income05
## 1 4700 TO 4960: Sales and Related Workers 5500
## 2 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 3 7900 TO 8960: Setters, Operators and Tenders 19000
## 4 5000 TO 5930: Office and Administrative Support Workers 36000
## 5 10 TO 430: Executive, Administrative and Managerial Occupations 65000
## 6 4200 TO 4250: Cleaning and Building Service Occupations 8000
## Weight05 HeightFeet05 HeightInch05 Imagazine Inewspaper Ilibrary MotherEd
## 1 160 5 2 1 1 1 5
## 2 187 5 5 0 1 1 12
## 3 175 5 9 1 1 1 12
## 4 246 5 3 1 1 1 9
## 5 180 5 6 1 1 1 12
## 6 235 6 0 1 1 1 12
## FatherEd FamilyIncome78 Science Arith Word Parag Number Coding Auto Math
## 1 8 20000 6 8 15 6 29 52 9 6
## 2 12 35000 23 30 35 15 45 68 21 23
## 3 12 8502 14 14 27 8 32 35 13 11
## 4 6 7227 18 13 35 12 24 48 11 4
## 5 10 17000 17 21 28 10 40 46 13 13
## 6 16 20000 16 30 29 13 36 30 21 24
## Mechanic Elec AFQT Esteem81_1 Esteem81_2 Esteem81_3 Esteem81_4 Esteem81_5
## 1 10 5 6.84 1 1 4 1 3
## 2 21 19 99.39 2 1 4 2 4
## 3 9 11 47.41 2 1 3 2 3
## 4 12 12 44.02 1 1 3 2 3
## 5 13 15 59.68 1 1 4 1 1
## 6 19 16 72.31 1 1 4 1 4
## Esteem81_6 Esteem81_7 Esteem81_8 Esteem81_9 Esteem81_10 Esteem87_1 Esteem87_2
## 1 3 1 3 3 3 2 1
## 2 2 2 4 3 4 1 1
## 3 2 2 2 3 3 2 2
## 4 2 3 3 3 3 1 1
## 5 1 1 4 4 4 1 1
## 6 1 1 4 4 4 1 1
## Esteem87_3 Esteem87_4 Esteem87_5 Esteem87_6 Esteem87_7 Esteem87_8 Esteem87_9
## 1 4 1 2 2 2 3 3
## 2 4 1 4 1 2 3 2
## 3 4 2 4 2 2 4 3
## 4 3 1 4 2 1 2 2
## 5 4 1 4 1 1 4 4
## 6 4 1 4 1 2 4 4
## Esteem87_10 PC1.esteem ft2m in2m totalheight kg bmi intelligence
## 1 4 -0.314 1.52 0.127 1.65 72.6 26.6 -3.145
## 2 4 0.461 1.52 0.127 1.65 84.8 31.1 3.678
## 3 4 0.199 1.52 0.127 1.65 79.4 29.1 -1.214
## 4 2 -0.976 1.52 0.127 1.65 111.6 40.9 -0.724
## 5 4 1.949 1.52 0.127 1.65 81.6 30.0 0.404
## 6 4 1.650 1.83 0.152 1.98 106.6 27.2 1.232
b) Run a few regression models between PC1 of all the esteem scores and suitable variables listed in a). Find a final best model with your own criterion.
bmi
##
## Call:
## lm(formula = PC1.esteem ~ bmi, data = nlsy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.428 -1.107 -0.074 1.130 2.163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.19605 0.10413 1.88 0.060 .
## bmi -0.00691 0.00355 -1.95 0.052 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.3 on 2429 degrees of freedom
## Multiple R-squared: 0.00156, Adjusted R-squared: 0.00115
## F-statistic: 3.79 on 1 and 2429 DF, p-value: 0.0518
intelligence
##
## Call:
## lm(formula = PC1.esteem ~ intelligence, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.219 -0.983 -0.041 1.030 2.941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.81e-16 2.54e-02 0.0 1
## intelligence 1.68e-01 1.24e-02 13.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.25 on 2429 degrees of freedom
## Multiple R-squared: 0.0702, Adjusted R-squared: 0.0698
## F-statistic: 183 on 1 and 2429 DF, p-value: <2e-16
Gender
##
## Call:
## lm(formula = PC1.esteem ~ Gender, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.516 -1.081 -0.067 1.096 2.029
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.0798 0.0374 -2.13 0.0330 *
## Gendermale 0.1575 0.0525 3.00 0.0028 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.3 on 2429 degrees of freedom
## Multiple R-squared: 0.00368, Adjusted R-squared: 0.00327
## F-statistic: 8.98 on 1 and 2429 DF, p-value: 0.00275
Since we saw slight diffferences in esteem scores across gender and newspaper indicators, we can run a model against these features
##
## Call:
## lm(formula = PC1.esteem ~ Inewspaper, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.512 -1.070 -0.101 1.087 2.403
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4539 0.0698 -6.51 9.4e-11 ***
## Inewspaper1 0.5274 0.0752 7.01 3.0e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.28 on 2429 degrees of freedom
## Multiple R-squared: 0.0198, Adjusted R-squared: 0.0194
## F-statistic: 49.2 on 1 and 2429 DF, p-value: 3.02e-12
##
## Call:
## lm(formula = PC1.esteem ~ Inewspaper + Gender, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.587 -1.043 -0.076 1.104 2.478
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5285 0.0742 -7.13 1.4e-12 ***
## Inewspaper1 0.5244 0.0751 6.98 3.7e-12 ***
## Gendermale 0.1525 0.0520 2.93 0.0034 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.28 on 2428 degrees of freedom
## Multiple R-squared: 0.0233, Adjusted R-squared: 0.0225
## F-statistic: 29 on 2 and 2428 DF, p-value: 3.73e-13
These results may warrant further investigation into a potential linear relationship between Males, newspaper usage at home, and esteem scores.
Personal information: gender, education (05), log(income) in 87, job type in 87.
combing Personal info
##
## Call:
## lm(formula = PC1.esteem ~ Gender + Education05, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.668 -0.999 -0.049 1.054 2.505
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.9091 0.1467 -13.01 < 2e-16 ***
## Gendermale 0.1768 0.0509 3.47 0.00052 ***
## Education05 0.1308 0.0102 12.86 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.25 on 2428 degrees of freedom
## Multiple R-squared: 0.0672, Adjusted R-squared: 0.0665
## F-statistic: 87.5 on 2 and 2428 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + Education05 + logincome, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.664 -0.979 -0.027 1.033 2.794
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.8249 0.2892 -13.23 <2e-16 ***
## Gendermale 0.0356 0.0553 0.64 0.52
## Education05 0.1136 0.0109 10.39 <2e-16 ***
## logincome 0.2420 0.0281 8.62 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.24 on 2121 degrees of freedom
## Multiple R-squared: 0.0873, Adjusted R-squared: 0.086
## F-statistic: 67.6 on 3 and 2121 DF, p-value: <2e-16
Here we see the effect of Male gender become insignificant when we control for log(income) in 87.
##
## Call:
## lm(formula = PC1.esteem ~ Gender + Education05 + Job05, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.266 -0.950 -0.050 0.985 2.886
##
## Coefficients:
## Estimate
## (Intercept) -1.7106
## Gendermale 0.2189
## Education05 0.1016
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.4637
## Job051000 TO 1240: Mathematical and Computer Scientists 0.4543
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.0342
## Job051600 TO 1760: Physical Scientists -0.9276
## Job051800 TO 1860: Social Scientists and Related Workers -0.2495
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.2351
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.0279
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.2836
## Job052200 TO 2340: Teachers 0.1556
## Job052400 TO 2550: Education, Training and Library Workers 0.1738
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.7372
## Job052800 TO 2960: Media and Communications Workers 0.3706
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.4270
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.2050
## Job053700 TO 3950: Protective Service Occupations 0.5558
## Job054000 TO 4160: Food Preparation and Serving Related Occupations -0.1711
## Job054200 TO 4250: Cleaning and Building Service Occupations -0.3592
## Job054300 TO 4430: Entertainment Attendants and Related Workers -0.8019
## Job054500 TO 4650: Personal Care and Service Workers 0.2316
## Job054700 TO 4960: Sales and Related Workers 0.2459
## Job05500 TO 950: Management Related Occupations 0.5166
## Job055000 TO 5930: Office and Administrative Support Workers 0.2801
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -0.4452
## Job056200 TO 6940: Construction Trade and Extraction Workers -0.0795
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.0947
## Job057700 TO 7750: Production and Operating Workers 0.1245
## Job057800 TO 7850: Food Preparation Occupations 0.1228
## Job057900 TO 8960: Setters, Operators and Tenders 0.0719
## Job059000 TO 9750: Transportation and Material Moving Workers -0.1841
## Job059990: Uncodeable -0.0968
## Std. Error
## (Intercept) 0.2326
## Gendermale 0.0591
## Education05 0.0122
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.1778
## Job051000 TO 1240: Mathematical and Computer Scientists 0.2276
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.2384
## Job051600 TO 1760: Physical Scientists 0.6408
## Job051800 TO 1860: Social Scientists and Related Workers 0.5324
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.4955
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.2563
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.3621
## Job052200 TO 2340: Teachers 0.2054
## Job052400 TO 2550: Education, Training and Library Workers 0.2837
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.3023
## Job052800 TO 2960: Media and Communications Workers 0.3811
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.2221
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.2072
## Job053700 TO 3950: Protective Service Occupations 0.2365
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.2234
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.2248
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.4243
## Job054500 TO 4650: Personal Care and Service Workers 0.2528
## Job054700 TO 4960: Sales and Related Workers 0.1866
## Job05500 TO 950: Management Related Occupations 0.2045
## Job055000 TO 5930: Office and Administrative Support Workers 0.1780
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.4459
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.2000
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.2068
## Job057700 TO 7750: Production and Operating Workers 0.2427
## Job057800 TO 7850: Food Preparation Occupations 0.6402
## Job057900 TO 8960: Setters, Operators and Tenders 0.2041
## Job059000 TO 9750: Transportation and Material Moving Workers 0.2028
## Job059990: Uncodeable 1.2476
## t value
## (Intercept) -7.35
## Gendermale 3.70
## Education05 8.35
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 2.61
## Job051000 TO 1240: Mathematical and Computer Scientists 2.00
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.14
## Job051600 TO 1760: Physical Scientists -1.45
## Job051800 TO 1860: Social Scientists and Related Workers -0.47
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.47
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.11
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.78
## Job052200 TO 2340: Teachers 0.76
## Job052400 TO 2550: Education, Training and Library Workers 0.61
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 2.44
## Job052800 TO 2960: Media and Communications Workers 0.97
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 1.92
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.99
## Job053700 TO 3950: Protective Service Occupations 2.35
## Job054000 TO 4160: Food Preparation and Serving Related Occupations -0.77
## Job054200 TO 4250: Cleaning and Building Service Occupations -1.60
## Job054300 TO 4430: Entertainment Attendants and Related Workers -1.89
## Job054500 TO 4650: Personal Care and Service Workers 0.92
## Job054700 TO 4960: Sales and Related Workers 1.32
## Job05500 TO 950: Management Related Occupations 2.53
## Job055000 TO 5930: Office and Administrative Support Workers 1.57
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -1.00
## Job056200 TO 6940: Construction Trade and Extraction Workers -0.40
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.46
## Job057700 TO 7750: Production and Operating Workers 0.51
## Job057800 TO 7850: Food Preparation Occupations 0.19
## Job057900 TO 8960: Setters, Operators and Tenders 0.35
## Job059000 TO 9750: Transportation and Material Moving Workers -0.91
## Job059990: Uncodeable -0.08
## Pr(>|t|)
## (Intercept) 2.6e-13
## Gendermale 0.00022
## Education05 < 2e-16
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.00917
## Job051000 TO 1240: Mathematical and Computer Scientists 0.04599
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.88593
## Job051600 TO 1760: Physical Scientists 0.14791
## Job051800 TO 1860: Social Scientists and Related Workers 0.63933
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.63531
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.91332
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.43368
## Job052200 TO 2340: Teachers 0.44878
## Job052400 TO 2550: Education, Training and Library Workers 0.54007
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.01480
## Job052800 TO 2960: Media and Communications Workers 0.33092
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.05460
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.32261
## Job053700 TO 3950: Protective Service Occupations 0.01887
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.44390
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.11017
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.05890
## Job054500 TO 4650: Personal Care and Service Workers 0.35962
## Job054700 TO 4960: Sales and Related Workers 0.18769
## Job05500 TO 950: Management Related Occupations 0.01159
## Job055000 TO 5930: Office and Administrative Support Workers 0.11568
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.31817
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.69095
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.64696
## Job057700 TO 7750: Production and Operating Workers 0.60798
## Job057800 TO 7850: Food Preparation Occupations 0.84791
## Job057900 TO 8960: Setters, Operators and Tenders 0.72460
## Job059000 TO 9750: Transportation and Material Moving Workers 0.36405
## Job059990: Uncodeable 0.93817
##
## (Intercept) ***
## Gendermale ***
## Education05 ***
## Job0510 TO 430: Executive, Administrative and Managerial Occupations **
## Job051000 TO 1240: Mathematical and Computer Scientists *
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## Job051600 TO 1760: Physical Scientists
## Job051800 TO 1860: Social Scientists and Related Workers
## Job051900 TO 1960: Life, Physical and Social Science Technicians
## Job052000 TO 2060: Counselors, Sociala and Religious Workers
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers
## Job052200 TO 2340: Teachers
## Job052400 TO 2550: Education, Training and Library Workers
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers *
## Job052800 TO 2960: Media and Communications Workers
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners .
## Job053300 TO 3650: Health Care Technical and Support Occupations
## Job053700 TO 3950: Protective Service Occupations *
## Job054000 TO 4160: Food Preparation and Serving Related Occupations
## Job054200 TO 4250: Cleaning and Building Service Occupations
## Job054300 TO 4430: Entertainment Attendants and Related Workers .
## Job054500 TO 4650: Personal Care and Service Workers
## Job054700 TO 4960: Sales and Related Workers
## Job05500 TO 950: Management Related Occupations *
## Job055000 TO 5930: Office and Administrative Support Workers
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations
## Job056200 TO 6940: Construction Trade and Extraction Workers
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers
## Job057700 TO 7750: Production and Operating Workers
## Job057800 TO 7850: Food Preparation Occupations
## Job057900 TO 8960: Setters, Operators and Tenders
## Job059000 TO 9750: Transportation and Material Moving Workers
## Job059990: Uncodeable
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.24 on 2398 degrees of freedom
## Multiple R-squared: 0.104, Adjusted R-squared: 0.0924
## F-statistic: 8.73 on 32 and 2398 DF, p-value: <2e-16
These models are only explaining less than 1% of the variation in the esteem data, however. Let’s look at household environment features
- Household environment: Imagazine, Inewspaper, Ilibrary, MotherEd, FatherEd, FamilyIncome78. Do set indicators `Imagazine`, `Inewspaper` and `Ilibrary` as factors.
##
## Call:
## lm(formula = PC1.esteem ~ Imagazine + Inewspaper + Ilibrary +
## MotherEd + FatherEd + FamilyIncome78, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.504 -0.986 -0.067 1.048 2.834
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.29e+00 1.23e-01 -10.47 < 2e-16 ***
## Imagazine1 1.52e-01 6.23e-02 2.44 0.01479 *
## Inewspaper1 2.19e-01 8.09e-02 2.70 0.00693 **
## Ilibrary1 1.16e-01 6.44e-02 1.80 0.07187 .
## MotherEd 4.38e-02 1.28e-02 3.41 0.00065 ***
## FatherEd 2.29e-02 9.57e-03 2.39 0.01675 *
## FamilyIncome78 5.48e-06 2.02e-06 2.72 0.00661 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.26 on 2424 degrees of freedom
## Multiple R-squared: 0.056, Adjusted R-squared: 0.0536
## F-statistic: 23.9 on 6 and 2424 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Inewspaper + MotherEd + Ilibrary +
## FamilyIncome78, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.455 -1.011 -0.051 1.065 2.952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.26e+00 1.23e-01 -10.29 < 2e-16 ***
## Inewspaper1 2.69e-01 7.98e-02 3.37 0.00077 ***
## MotherEd 6.55e-02 1.09e-02 6.02 2e-09 ***
## Ilibrary1 1.47e-01 6.39e-02 2.31 0.02115 *
## FamilyIncome78 7.02e-06 1.98e-06 3.55 0.00039 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.26 on 2426 degrees of freedom
## Multiple R-squared: 0.0508, Adjusted R-squared: 0.0492
## F-statistic: 32.4 on 4 and 2426 DF, p-value: <2e-16
Combining more personal info with home environment
##
## Call:
## lm(formula = PC1.esteem ~ Gender + Education05 + logincome +
## Inewspaper + MotherEd + FatherEd + Ilibrary + FamilyIncome78,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.651 -0.973 -0.034 1.011 2.777
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.97e+00 2.95e-01 -13.48 < 2e-16 ***
## Gendermale 3.30e-02 5.52e-02 0.60 0.551
## Education05 8.78e-02 1.23e-02 7.15 1.2e-12 ***
## logincome 2.24e-01 2.83e-02 7.90 4.4e-15 ***
## Inewspaper1 1.88e-01 8.53e-02 2.21 0.027 *
## MotherEd 2.57e-02 1.35e-02 1.90 0.057 .
## FatherEd 8.72e-03 9.99e-03 0.87 0.383
## Ilibrary1 8.22e-02 6.73e-02 1.22 0.222
## FamilyIncome78 2.11e-06 2.07e-06 1.02 0.309
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 2116 degrees of freedom
## Multiple R-squared: 0.0983, Adjusted R-squared: 0.0949
## F-statistic: 28.9 on 8 and 2116 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + logincome + FatherEd + MotherEd +
## Imagazine + bmi, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.530 -0.986 -0.002 1.010 2.950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.05281 0.30119 -10.14 <2e-16 ***
## Gendermale -0.00106 0.05613 -0.02 0.9850
## logincome 0.24051 0.02835 8.48 <2e-16 ***
## FatherEd 0.02643 0.00976 2.71 0.0068 **
## MotherEd 0.04831 0.01333 3.62 0.0003 ***
## Imagazine1 0.16924 0.06536 2.59 0.0097 **
## bmi -0.00562 0.00363 -1.55 0.1217
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.25 on 2118 degrees of freedom
## Multiple R-squared: 0.0747, Adjusted R-squared: 0.0721
## F-statistic: 28.5 on 6 and 2118 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + logincome + MotherEd + Job05 +
## bmi + intelligence + FamilyIncome78, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.957 -0.922 -0.002 1.018 2.984
##
## Coefficients:
## Estimate
## (Intercept) -2.38e+00
## Gendermale 3.45e-02
## logincome 1.94e-01
## MotherEd 3.37e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 5.12e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 4.96e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -1.13e-01
## Job051600 TO 1760: Physical Scientists -7.92e-01
## Job051800 TO 1860: Social Scientists and Related Workers -8.26e-02
## Job051900 TO 1960: Life, Physical and Social Science Technicians 3.10e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 3.30e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 4.97e-01
## Job052200 TO 2340: Teachers 4.32e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.83e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 7.57e-01
## Job052800 TO 2960: Media and Communications Workers 6.73e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 4.99e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations -1.04e-01
## Job053700 TO 3950: Protective Service Occupations 6.88e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 4.00e-02
## Job054200 TO 4250: Cleaning and Building Service Occupations -3.77e-02
## Job054300 TO 4430: Entertainment Attendants and Related Workers -5.65e-01
## Job054500 TO 4650: Personal Care and Service Workers 1.77e-01
## Job054700 TO 4960: Sales and Related Workers 3.02e-01
## Job05500 TO 950: Management Related Occupations 6.15e-01
## Job055000 TO 5930: Office and Administrative Support Workers 3.86e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -2.74e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.20e-02
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 7.45e-02
## Job057700 TO 7750: Production and Operating Workers 9.97e-02
## Job057800 TO 7850: Food Preparation Occupations 1.91e-01
## Job057900 TO 8960: Setters, Operators and Tenders 1.71e-01
## Job059000 TO 9750: Transportation and Material Moving Workers -6.68e-02
## Job059990: Uncodeable -2.23e-01
## bmi -5.59e-03
## intelligence 8.36e-02
## FamilyIncome78 2.26e-06
## Std. Error
## (Intercept) 3.64e-01
## Gendermale 6.60e-02
## logincome 2.88e-02
## MotherEd 1.19e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 1.91e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 2.44e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 2.54e-01
## Job051600 TO 1760: Physical Scientists 6.40e-01
## Job051800 TO 1860: Social Scientists and Related Workers 5.31e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 4.97e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.71e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.74e-01
## Job052200 TO 2340: Teachers 2.14e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.00e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 3.17e-01
## Job052800 TO 2960: Media and Communications Workers 4.11e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.38e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations 2.25e-01
## Job053700 TO 3950: Protective Service Occupations 2.49e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 2.49e-01
## Job054200 TO 4250: Cleaning and Building Service Occupations 2.48e-01
## Job054300 TO 4430: Entertainment Attendants and Related Workers 4.46e-01
## Job054500 TO 4650: Personal Care and Service Workers 2.81e-01
## Job054700 TO 4960: Sales and Related Workers 1.99e-01
## Job05500 TO 950: Management Related Occupations 2.15e-01
## Job055000 TO 5930: Office and Administrative Support Workers 1.91e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 4.71e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.15e-01
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 2.19e-01
## Job057700 TO 7750: Production and Operating Workers 2.56e-01
## Job057800 TO 7850: Food Preparation Occupations 7.30e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.21e-01
## Job059000 TO 9750: Transportation and Material Moving Workers 2.16e-01
## Job059990: Uncodeable 1.24e+00
## bmi 3.58e-03
## intelligence 1.64e-02
## FamilyIncome78 2.04e-06
## t value
## (Intercept) -6.53
## Gendermale 0.52
## logincome 6.76
## MotherEd 2.84
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 2.69
## Job051000 TO 1240: Mathematical and Computer Scientists 2.04
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.45
## Job051600 TO 1760: Physical Scientists -1.24
## Job051800 TO 1860: Social Scientists and Related Workers -0.16
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.62
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 1.22
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 1.33
## Job052200 TO 2340: Teachers 2.01
## Job052400 TO 2550: Education, Training and Library Workers 1.28
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 2.39
## Job052800 TO 2960: Media and Communications Workers 1.64
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.10
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.46
## Job053700 TO 3950: Protective Service Occupations 2.77
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.16
## Job054200 TO 4250: Cleaning and Building Service Occupations -0.15
## Job054300 TO 4430: Entertainment Attendants and Related Workers -1.27
## Job054500 TO 4650: Personal Care and Service Workers 0.63
## Job054700 TO 4960: Sales and Related Workers 1.52
## Job05500 TO 950: Management Related Occupations 2.86
## Job055000 TO 5930: Office and Administrative Support Workers 2.03
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -0.58
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.10
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.34
## Job057700 TO 7750: Production and Operating Workers 0.39
## Job057800 TO 7850: Food Preparation Occupations 0.26
## Job057900 TO 8960: Setters, Operators and Tenders 0.77
## Job059000 TO 9750: Transportation and Material Moving Workers -0.31
## Job059990: Uncodeable -0.18
## bmi -1.56
## intelligence 5.11
## FamilyIncome78 1.10
## Pr(>|t|)
## (Intercept) 8.3e-11
## Gendermale 0.6007
## logincome 1.8e-11
## MotherEd 0.0046
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.0073
## Job051000 TO 1240: Mathematical and Computer Scientists 0.0417
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.6563
## Job051600 TO 1760: Physical Scientists 0.2160
## Job051800 TO 1860: Social Scientists and Related Workers 0.8765
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.5333
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.2233
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.1839
## Job052200 TO 2340: Teachers 0.0441
## Job052400 TO 2550: Education, Training and Library Workers 0.2014
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.0169
## Job052800 TO 2960: Media and Communications Workers 0.1018
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.0359
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.6428
## Job053700 TO 3950: Protective Service Occupations 0.0057
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.8724
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.8793
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.2058
## Job054500 TO 4650: Personal Care and Service Workers 0.5277
## Job054700 TO 4960: Sales and Related Workers 0.1299
## Job05500 TO 950: Management Related Occupations 0.0043
## Job055000 TO 5930: Office and Administrative Support Workers 0.0428
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.5607
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.9185
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.7338
## Job057700 TO 7750: Production and Operating Workers 0.6973
## Job057800 TO 7850: Food Preparation Occupations 0.7936
## Job057900 TO 8960: Setters, Operators and Tenders 0.4391
## Job059000 TO 9750: Transportation and Material Moving Workers 0.7570
## Job059990: Uncodeable 0.8573
## bmi 0.1187
## intelligence 3.6e-07
## FamilyIncome78 0.2696
##
## (Intercept) ***
## Gendermale
## logincome ***
## MotherEd **
## Job0510 TO 430: Executive, Administrative and Managerial Occupations **
## Job051000 TO 1240: Mathematical and Computer Scientists *
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## Job051600 TO 1760: Physical Scientists
## Job051800 TO 1860: Social Scientists and Related Workers
## Job051900 TO 1960: Life, Physical and Social Science Technicians
## Job052000 TO 2060: Counselors, Sociala and Religious Workers
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers
## Job052200 TO 2340: Teachers *
## Job052400 TO 2550: Education, Training and Library Workers
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers *
## Job052800 TO 2960: Media and Communications Workers
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners *
## Job053300 TO 3650: Health Care Technical and Support Occupations
## Job053700 TO 3950: Protective Service Occupations **
## Job054000 TO 4160: Food Preparation and Serving Related Occupations
## Job054200 TO 4250: Cleaning and Building Service Occupations
## Job054300 TO 4430: Entertainment Attendants and Related Workers
## Job054500 TO 4650: Personal Care and Service Workers
## Job054700 TO 4960: Sales and Related Workers
## Job05500 TO 950: Management Related Occupations **
## Job055000 TO 5930: Office and Administrative Support Workers *
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations
## Job056200 TO 6940: Construction Trade and Extraction Workers
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers
## Job057700 TO 7750: Production and Operating Workers
## Job057800 TO 7850: Food Preparation Occupations
## Job057900 TO 8960: Setters, Operators and Tenders
## Job059000 TO 9750: Transportation and Material Moving Workers
## Job059990: Uncodeable
## bmi
## intelligence ***
## FamilyIncome78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 2088 degrees of freedom
## Multiple R-squared: 0.121, Adjusted R-squared: 0.106
## F-statistic: 8 on 36 and 2088 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + logincome + MotherEd + Inewspaper +
## Imagazine + Ilibrary + Job05 + bmi + intelligence + FamilyIncome78,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.037 -0.916 -0.013 1.039 2.980
##
## Coefficients:
## Estimate
## (Intercept) -2.47e+00
## Gendermale 4.42e-02
## logincome 1.90e-01
## MotherEd 2.71e-02
## Inewspaper1 1.56e-01
## Imagazine1 3.75e-02
## Ilibrary1 6.96e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 5.02e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 5.12e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -1.09e-01
## Job051600 TO 1760: Physical Scientists -8.20e-01
## Job051800 TO 1860: Social Scientists and Related Workers -7.88e-02
## Job051900 TO 1960: Life, Physical and Social Science Technicians 2.90e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 3.41e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 4.77e-01
## Job052200 TO 2340: Teachers 4.23e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.78e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 7.36e-01
## Job052800 TO 2960: Media and Communications Workers 6.56e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 4.94e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations -9.65e-02
## Job053700 TO 3950: Protective Service Occupations 6.83e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 1.38e-02
## Job054200 TO 4250: Cleaning and Building Service Occupations -1.78e-02
## Job054300 TO 4430: Entertainment Attendants and Related Workers -5.94e-01
## Job054500 TO 4650: Personal Care and Service Workers 1.75e-01
## Job054700 TO 4960: Sales and Related Workers 2.97e-01
## Job05500 TO 950: Management Related Occupations 6.14e-01
## Job055000 TO 5930: Office and Administrative Support Workers 3.85e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -1.94e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.98e-02
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 7.63e-02
## Job057700 TO 7750: Production and Operating Workers 1.12e-01
## Job057800 TO 7850: Food Preparation Occupations 1.51e-01
## Job057900 TO 8960: Setters, Operators and Tenders 1.76e-01
## Job059000 TO 9750: Transportation and Material Moving Workers -7.62e-02
## Job059990: Uncodeable -2.66e-01
## bmi -5.57e-03
## intelligence 7.60e-02
## FamilyIncome78 1.61e-06
## Std. Error
## (Intercept) 3.67e-01
## Gendermale 6.61e-02
## logincome 2.89e-02
## MotherEd 1.22e-02
## Inewspaper1 8.66e-02
## Imagazine1 6.66e-02
## Ilibrary1 6.72e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 1.91e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 2.44e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 2.54e-01
## Job051600 TO 1760: Physical Scientists 6.40e-01
## Job051800 TO 1860: Social Scientists and Related Workers 5.31e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 4.97e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.71e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.74e-01
## Job052200 TO 2340: Teachers 2.14e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.00e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 3.17e-01
## Job052800 TO 2960: Media and Communications Workers 4.11e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.38e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations 2.24e-01
## Job053700 TO 3950: Protective Service Occupations 2.49e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 2.49e-01
## Job054200 TO 4250: Cleaning and Building Service Occupations 2.49e-01
## Job054300 TO 4430: Entertainment Attendants and Related Workers 4.47e-01
## Job054500 TO 4650: Personal Care and Service Workers 2.80e-01
## Job054700 TO 4960: Sales and Related Workers 1.99e-01
## Job05500 TO 950: Management Related Occupations 2.15e-01
## Job055000 TO 5930: Office and Administrative Support Workers 1.91e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 4.72e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.15e-01
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 2.19e-01
## Job057700 TO 7750: Production and Operating Workers 2.56e-01
## Job057800 TO 7850: Food Preparation Occupations 7.30e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.21e-01
## Job059000 TO 9750: Transportation and Material Moving Workers 2.16e-01
## Job059990: Uncodeable 1.24e+00
## bmi 3.58e-03
## intelligence 1.68e-02
## FamilyIncome78 2.06e-06
## t value
## (Intercept) -6.74
## Gendermale 0.67
## logincome 6.60
## MotherEd 2.22
## Inewspaper1 1.80
## Imagazine1 0.56
## Ilibrary1 1.04
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 2.63
## Job051000 TO 1240: Mathematical and Computer Scientists 2.10
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.43
## Job051600 TO 1760: Physical Scientists -1.28
## Job051800 TO 1860: Social Scientists and Related Workers -0.15
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.58
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 1.26
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 1.28
## Job052200 TO 2340: Teachers 1.98
## Job052400 TO 2550: Education, Training and Library Workers 1.26
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 2.32
## Job052800 TO 2960: Media and Communications Workers 1.59
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.08
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.43
## Job053700 TO 3950: Protective Service Occupations 2.75
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.06
## Job054200 TO 4250: Cleaning and Building Service Occupations -0.07
## Job054300 TO 4430: Entertainment Attendants and Related Workers -1.33
## Job054500 TO 4650: Personal Care and Service Workers 0.62
## Job054700 TO 4960: Sales and Related Workers 1.49
## Job05500 TO 950: Management Related Occupations 2.85
## Job055000 TO 5930: Office and Administrative Support Workers 2.02
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -0.41
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.14
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.35
## Job057700 TO 7750: Production and Operating Workers 0.44
## Job057800 TO 7850: Food Preparation Occupations 0.21
## Job057900 TO 8960: Setters, Operators and Tenders 0.80
## Job059000 TO 9750: Transportation and Material Moving Workers -0.35
## Job059990: Uncodeable -0.21
## bmi -1.56
## intelligence 4.54
## FamilyIncome78 0.78
## Pr(>|t|)
## (Intercept) 2.1e-11
## Gendermale 0.5036
## logincome 5.1e-11
## MotherEd 0.0265
## Inewspaper1 0.0725
## Imagazine1 0.5729
## Ilibrary1 0.3002
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.0085
## Job051000 TO 1240: Mathematical and Computer Scientists 0.0358
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.6686
## Job051600 TO 1760: Physical Scientists 0.2002
## Job051800 TO 1860: Social Scientists and Related Workers 0.8821
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.5590
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.2090
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.2015
## Job052200 TO 2340: Teachers 0.0484
## Job052400 TO 2550: Education, Training and Library Workers 0.2074
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.0203
## Job052800 TO 2960: Media and Communications Workers 0.1110
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.0379
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.6675
## Job053700 TO 3950: Protective Service Occupations 0.0061
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.9558
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.9428
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.1839
## Job054500 TO 4650: Personal Care and Service Workers 0.5334
## Job054700 TO 4960: Sales and Related Workers 0.1367
## Job05500 TO 950: Management Related Occupations 0.0044
## Job055000 TO 5930: Office and Administrative Support Workers 0.0434
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.6806
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.8894
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.7274
## Job057700 TO 7750: Production and Operating Workers 0.6609
## Job057800 TO 7850: Food Preparation Occupations 0.8364
## Job057900 TO 8960: Setters, Operators and Tenders 0.4247
## Job059000 TO 9750: Transportation and Material Moving Workers 0.7246
## Job059990: Uncodeable 0.8300
## bmi 0.1198
## intelligence 6.1e-06
## FamilyIncome78 0.4352
##
## (Intercept) ***
## Gendermale
## logincome ***
## MotherEd *
## Inewspaper1 .
## Imagazine1
## Ilibrary1
## Job0510 TO 430: Executive, Administrative and Managerial Occupations **
## Job051000 TO 1240: Mathematical and Computer Scientists *
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## Job051600 TO 1760: Physical Scientists
## Job051800 TO 1860: Social Scientists and Related Workers
## Job051900 TO 1960: Life, Physical and Social Science Technicians
## Job052000 TO 2060: Counselors, Sociala and Religious Workers
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers
## Job052200 TO 2340: Teachers *
## Job052400 TO 2550: Education, Training and Library Workers
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers *
## Job052800 TO 2960: Media and Communications Workers
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners *
## Job053300 TO 3650: Health Care Technical and Support Occupations
## Job053700 TO 3950: Protective Service Occupations **
## Job054000 TO 4160: Food Preparation and Serving Related Occupations
## Job054200 TO 4250: Cleaning and Building Service Occupations
## Job054300 TO 4430: Entertainment Attendants and Related Workers
## Job054500 TO 4650: Personal Care and Service Workers
## Job054700 TO 4960: Sales and Related Workers
## Job05500 TO 950: Management Related Occupations **
## Job055000 TO 5930: Office and Administrative Support Workers *
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations
## Job056200 TO 6940: Construction Trade and Extraction Workers
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers
## Job057700 TO 7750: Production and Operating Workers
## Job057800 TO 7850: Food Preparation Occupations
## Job057900 TO 8960: Setters, Operators and Tenders
## Job059000 TO 9750: Transportation and Material Moving Workers
## Job059990: Uncodeable
## bmi
## intelligence ***
## FamilyIncome78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.23 on 2085 degrees of freedom
## Multiple R-squared: 0.124, Adjusted R-squared: 0.107
## F-statistic: 7.54 on 39 and 2085 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + logincome + MotherEd + Education05 +
## Inewspaper + Imagazine + Ilibrary + Job05 + bmi + intelligence +
## FamilyIncome78, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.151 -0.929 -0.026 1.019 2.979
##
## Coefficients:
## Estimate
## (Intercept) -3.21e+00
## Gendermale 4.94e-02
## logincome 1.93e-01
## MotherEd 1.78e-02
## Education05 6.05e-02
## Inewspaper1 1.60e-01
## Imagazine1 2.44e-02
## Ilibrary1 5.65e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 4.58e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 4.66e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -1.41e-01
## Job051600 TO 1760: Physical Scientists -9.43e-01
## Job051800 TO 1860: Social Scientists and Related Workers -2.42e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 2.70e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.04e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.25e-01
## Job052200 TO 2340: Teachers 2.38e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.47e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 6.90e-01
## Job052800 TO 2960: Media and Communications Workers 5.99e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 3.69e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations -1.04e-01
## Job053700 TO 3950: Protective Service Occupations 6.66e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 5.95e-02
## Job054200 TO 4250: Cleaning and Building Service Occupations 1.64e-02
## Job054300 TO 4430: Entertainment Attendants and Related Workers -6.08e-01
## Job054500 TO 4650: Personal Care and Service Workers 1.83e-01
## Job054700 TO 4960: Sales and Related Workers 2.91e-01
## Job05500 TO 950: Management Related Occupations 5.48e-01
## Job055000 TO 5930: Office and Administrative Support Workers 3.95e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -1.42e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 1.00e-01
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 1.44e-01
## Job057700 TO 7750: Production and Operating Workers 1.45e-01
## Job057800 TO 7850: Food Preparation Occupations 2.25e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.34e-01
## Job059000 TO 9750: Transportation and Material Moving Workers -3.36e-02
## Job059990: Uncodeable -1.44e-01
## bmi -4.83e-03
## intelligence 5.45e-02
## FamilyIncome78 1.08e-06
## Std. Error
## (Intercept) 4.06e-01
## Gendermale 6.58e-02
## logincome 2.87e-02
## MotherEd 1.24e-02
## Education05 1.45e-02
## Inewspaper1 8.62e-02
## Imagazine1 6.64e-02
## Ilibrary1 6.70e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 1.90e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 2.43e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 2.53e-01
## Job051600 TO 1760: Physical Scientists 6.38e-01
## Job051800 TO 1860: Social Scientists and Related Workers 5.31e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 4.95e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.72e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.74e-01
## Job052200 TO 2340: Teachers 2.18e-01
## Job052400 TO 2550: Education, Training and Library Workers 2.99e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 3.16e-01
## Job052800 TO 2960: Media and Communications Workers 4.10e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.39e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations 2.24e-01
## Job053700 TO 3950: Protective Service Occupations 2.48e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 2.49e-01
## Job054200 TO 4250: Cleaning and Building Service Occupations 2.48e-01
## Job054300 TO 4430: Entertainment Attendants and Related Workers 4.45e-01
## Job054500 TO 4650: Personal Care and Service Workers 2.79e-01
## Job054700 TO 4960: Sales and Related Workers 1.98e-01
## Job05500 TO 950: Management Related Occupations 2.15e-01
## Job055000 TO 5930: Office and Administrative Support Workers 1.90e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 4.71e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.14e-01
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 2.19e-01
## Job057700 TO 7750: Production and Operating Workers 2.55e-01
## Job057800 TO 7850: Food Preparation Occupations 7.27e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.20e-01
## Job059000 TO 9750: Transportation and Material Moving Workers 2.16e-01
## Job059990: Uncodeable 1.23e+00
## bmi 3.57e-03
## intelligence 1.75e-02
## FamilyIncome78 2.06e-06
## t value
## (Intercept) -7.91
## Gendermale 0.75
## logincome 6.73
## MotherEd 1.44
## Education05 4.18
## Inewspaper1 1.85
## Imagazine1 0.37
## Ilibrary1 0.84
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 2.41
## Job051000 TO 1240: Mathematical and Computer Scientists 1.92
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.56
## Job051600 TO 1760: Physical Scientists -1.48
## Job051800 TO 1860: Social Scientists and Related Workers -0.46
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.55
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.75
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.87
## Job052200 TO 2340: Teachers 1.09
## Job052400 TO 2550: Education, Training and Library Workers 1.16
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 2.19
## Job052800 TO 2960: Media and Communications Workers 1.46
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 1.54
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.46
## Job053700 TO 3950: Protective Service Occupations 2.69
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.24
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.07
## Job054300 TO 4430: Entertainment Attendants and Related Workers -1.37
## Job054500 TO 4650: Personal Care and Service Workers 0.65
## Job054700 TO 4960: Sales and Related Workers 1.46
## Job05500 TO 950: Management Related Occupations 2.55
## Job055000 TO 5930: Office and Administrative Support Workers 2.08
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -0.30
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.47
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.66
## Job057700 TO 7750: Production and Operating Workers 0.57
## Job057800 TO 7850: Food Preparation Occupations 0.31
## Job057900 TO 8960: Setters, Operators and Tenders 1.06
## Job059000 TO 9750: Transportation and Material Moving Workers -0.16
## Job059990: Uncodeable -0.12
## bmi -1.35
## intelligence 3.12
## FamilyIncome78 0.52
## Pr(>|t|)
## (Intercept) 4.2e-15
## Gendermale 0.4526
## logincome 2.2e-11
## MotherEd 0.1501
## Education05 3.1e-05
## Inewspaper1 0.0642
## Imagazine1 0.7131
## Ilibrary1 0.3992
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.0159
## Job051000 TO 1240: Mathematical and Computer Scientists 0.0550
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.5766
## Job051600 TO 1760: Physical Scientists 0.1397
## Job051800 TO 1860: Social Scientists and Related Workers 0.6478
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.5852
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.4525
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.3844
## Job052200 TO 2340: Teachers 0.2748
## Job052400 TO 2550: Education, Training and Library Workers 0.2453
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.0289
## Job052800 TO 2960: Media and Communications Workers 0.1443
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.1227
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.6427
## Job053700 TO 3950: Protective Service Occupations 0.0073
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.8111
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.9472
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.1717
## Job054500 TO 4650: Personal Care and Service Workers 0.5128
## Job054700 TO 4960: Sales and Related Workers 0.1435
## Job05500 TO 950: Management Related Occupations 0.0109
## Job055000 TO 5930: Office and Administrative Support Workers 0.0379
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.7623
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.6393
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.5099
## Job057700 TO 7750: Production and Operating Workers 0.5698
## Job057800 TO 7850: Food Preparation Occupations 0.7572
## Job057900 TO 8960: Setters, Operators and Tenders 0.2885
## Job059000 TO 9750: Transportation and Material Moving Workers 0.8763
## Job059990: Uncodeable 0.9073
## bmi 0.1764
## intelligence 0.0018
## FamilyIncome78 0.6013
##
## (Intercept) ***
## Gendermale
## logincome ***
## MotherEd
## Education05 ***
## Inewspaper1 .
## Imagazine1
## Ilibrary1
## Job0510 TO 430: Executive, Administrative and Managerial Occupations *
## Job051000 TO 1240: Mathematical and Computer Scientists .
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## Job051600 TO 1760: Physical Scientists
## Job051800 TO 1860: Social Scientists and Related Workers
## Job051900 TO 1960: Life, Physical and Social Science Technicians
## Job052000 TO 2060: Counselors, Sociala and Religious Workers
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers
## Job052200 TO 2340: Teachers
## Job052400 TO 2550: Education, Training and Library Workers
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers *
## Job052800 TO 2960: Media and Communications Workers
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners
## Job053300 TO 3650: Health Care Technical and Support Occupations
## Job053700 TO 3950: Protective Service Occupations **
## Job054000 TO 4160: Food Preparation and Serving Related Occupations
## Job054200 TO 4250: Cleaning and Building Service Occupations
## Job054300 TO 4430: Entertainment Attendants and Related Workers
## Job054500 TO 4650: Personal Care and Service Workers
## Job054700 TO 4960: Sales and Related Workers
## Job05500 TO 950: Management Related Occupations *
## Job055000 TO 5930: Office and Administrative Support Workers *
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations
## Job056200 TO 6940: Construction Trade and Extraction Workers
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers
## Job057700 TO 7750: Production and Operating Workers
## Job057800 TO 7850: Food Preparation Occupations
## Job057900 TO 8960: Setters, Operators and Tenders
## Job059000 TO 9750: Transportation and Material Moving Workers
## Job059990: Uncodeable
## bmi
## intelligence **
## FamilyIncome78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.22 on 2084 degrees of freedom
## Multiple R-squared: 0.131, Adjusted R-squared: 0.114
## F-statistic: 7.84 on 40 and 2084 DF, p-value: <2e-16
##
## Call:
## lm(formula = PC1.esteem ~ Gender + logincome + MotherEd + Education05 +
## Inewspaper + Job05 + bmi + intelligence + FamilyIncome78,
## data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.125 -0.931 -0.024 1.023 3.015
##
## Coefficients:
## Estimate
## (Intercept) -3.19e+00
## Gendermale 4.63e-02
## logincome 1.93e-01
## MotherEd 1.94e-02
## Education05 6.13e-02
## Inewspaper1 1.75e-01
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 4.61e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 4.68e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -1.45e-01
## Job051600 TO 1760: Physical Scientists -9.37e-01
## Job051800 TO 1860: Social Scientists and Related Workers -2.59e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 2.65e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.03e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.31e-01
## Job052200 TO 2340: Teachers 2.36e-01
## Job052400 TO 2550: Education, Training and Library Workers 3.47e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 6.98e-01
## Job052800 TO 2960: Media and Communications Workers 6.03e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 3.66e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations -1.09e-01
## Job053700 TO 3950: Protective Service Occupations 6.69e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 6.02e-02
## Job054200 TO 4250: Cleaning and Building Service Occupations 7.25e-03
## Job054300 TO 4430: Entertainment Attendants and Related Workers -6.05e-01
## Job054500 TO 4650: Personal Care and Service Workers 1.83e-01
## Job054700 TO 4960: Sales and Related Workers 2.91e-01
## Job05500 TO 950: Management Related Occupations 5.44e-01
## Job055000 TO 5930: Office and Administrative Support Workers 3.91e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -1.55e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 9.99e-02
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 1.41e-01
## Job057700 TO 7750: Production and Operating Workers 1.42e-01
## Job057800 TO 7850: Food Preparation Occupations 2.35e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.28e-01
## Job059000 TO 9750: Transportation and Material Moving Workers -3.94e-02
## Job059990: Uncodeable -1.26e-01
## bmi -4.85e-03
## intelligence 5.59e-02
## FamilyIncome78 1.20e-06
## Std. Error
## (Intercept) 4.05e-01
## Gendermale 6.57e-02
## logincome 2.87e-02
## MotherEd 1.22e-02
## Education05 1.44e-02
## Inewspaper1 8.45e-02
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 1.90e-01
## Job051000 TO 1240: Mathematical and Computer Scientists 2.43e-01
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 2.53e-01
## Job051600 TO 1760: Physical Scientists 6.38e-01
## Job051800 TO 1860: Social Scientists and Related Workers 5.30e-01
## Job051900 TO 1960: Life, Physical and Social Science Technicians 4.94e-01
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 2.72e-01
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 3.74e-01
## Job052200 TO 2340: Teachers 2.18e-01
## Job052400 TO 2550: Education, Training and Library Workers 2.98e-01
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 3.15e-01
## Job052800 TO 2960: Media and Communications Workers 4.10e-01
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 2.39e-01
## Job053300 TO 3650: Health Care Technical and Support Occupations 2.23e-01
## Job053700 TO 3950: Protective Service Occupations 2.48e-01
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 2.49e-01
## Job054200 TO 4250: Cleaning and Building Service Occupations 2.47e-01
## Job054300 TO 4430: Entertainment Attendants and Related Workers 4.44e-01
## Job054500 TO 4650: Personal Care and Service Workers 2.79e-01
## Job054700 TO 4960: Sales and Related Workers 1.98e-01
## Job05500 TO 950: Management Related Occupations 2.15e-01
## Job055000 TO 5930: Office and Administrative Support Workers 1.90e-01
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 4.70e-01
## Job056200 TO 6940: Construction Trade and Extraction Workers 2.14e-01
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 2.19e-01
## Job057700 TO 7750: Production and Operating Workers 2.55e-01
## Job057800 TO 7850: Food Preparation Occupations 7.27e-01
## Job057900 TO 8960: Setters, Operators and Tenders 2.20e-01
## Job059000 TO 9750: Transportation and Material Moving Workers 2.15e-01
## Job059990: Uncodeable 1.23e+00
## bmi 3.57e-03
## intelligence 1.73e-02
## FamilyIncome78 2.05e-06
## t value
## (Intercept) -7.87
## Gendermale 0.70
## logincome 6.71
## MotherEd 1.59
## Education05 4.25
## Inewspaper1 2.07
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 2.43
## Job051000 TO 1240: Mathematical and Computer Scientists 1.93
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians -0.57
## Job051600 TO 1760: Physical Scientists -1.47
## Job051800 TO 1860: Social Scientists and Related Workers -0.49
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.54
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.75
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.89
## Job052200 TO 2340: Teachers 1.08
## Job052400 TO 2550: Education, Training and Library Workers 1.16
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 2.21
## Job052800 TO 2960: Media and Communications Workers 1.47
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 1.54
## Job053300 TO 3650: Health Care Technical and Support Occupations -0.49
## Job053700 TO 3950: Protective Service Occupations 2.70
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.24
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.03
## Job054300 TO 4430: Entertainment Attendants and Related Workers -1.36
## Job054500 TO 4650: Personal Care and Service Workers 0.66
## Job054700 TO 4960: Sales and Related Workers 1.47
## Job05500 TO 950: Management Related Occupations 2.53
## Job055000 TO 5930: Office and Administrative Support Workers 2.06
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations -0.33
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.47
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.65
## Job057700 TO 7750: Production and Operating Workers 0.56
## Job057800 TO 7850: Food Preparation Occupations 0.32
## Job057900 TO 8960: Setters, Operators and Tenders 1.03
## Job059000 TO 9750: Transportation and Material Moving Workers -0.18
## Job059990: Uncodeable -0.10
## bmi -1.36
## intelligence 3.23
## FamilyIncome78 0.58
## Pr(>|t|)
## (Intercept) 5.7e-15
## Gendermale 0.4814
## logincome 2.4e-11
## MotherEd 0.1124
## Education05 2.2e-05
## Inewspaper1 0.0386
## Job0510 TO 430: Executive, Administrative and Managerial Occupations 0.0154
## Job051000 TO 1240: Mathematical and Computer Scientists 0.0539
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians 0.5661
## Job051600 TO 1760: Physical Scientists 0.1421
## Job051800 TO 1860: Social Scientists and Related Workers 0.6249
## Job051900 TO 1960: Life, Physical and Social Science Technicians 0.5920
## Job052000 TO 2060: Counselors, Sociala and Religious Workers 0.4556
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers 0.3761
## Job052200 TO 2340: Teachers 0.2785
## Job052400 TO 2550: Education, Training and Library Workers 0.2456
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers 0.0271
## Job052800 TO 2960: Media and Communications Workers 0.1410
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners 0.1249
## Job053300 TO 3650: Health Care Technical and Support Occupations 0.6254
## Job053700 TO 3950: Protective Service Occupations 0.0069
## Job054000 TO 4160: Food Preparation and Serving Related Occupations 0.8087
## Job054200 TO 4250: Cleaning and Building Service Occupations 0.9766
## Job054300 TO 4430: Entertainment Attendants and Related Workers 0.1736
## Job054500 TO 4650: Personal Care and Service Workers 0.5125
## Job054700 TO 4960: Sales and Related Workers 0.1424
## Job05500 TO 950: Management Related Occupations 0.0115
## Job055000 TO 5930: Office and Administrative Support Workers 0.0393
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations 0.7420
## Job056200 TO 6940: Construction Trade and Extraction Workers 0.6412
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers 0.5189
## Job057700 TO 7750: Production and Operating Workers 0.5773
## Job057800 TO 7850: Food Preparation Occupations 0.7461
## Job057900 TO 8960: Setters, Operators and Tenders 0.3014
## Job059000 TO 9750: Transportation and Material Moving Workers 0.8549
## Job059990: Uncodeable 0.9185
## bmi 0.1741
## intelligence 0.0013
## FamilyIncome78 0.5588
##
## (Intercept) ***
## Gendermale
## logincome ***
## MotherEd
## Education05 ***
## Inewspaper1 *
## Job0510 TO 430: Executive, Administrative and Managerial Occupations *
## Job051000 TO 1240: Mathematical and Computer Scientists .
## Job051300 TO 1560: Engineers, Architects, Surveyers, Engineering and Related Technicians
## Job051600 TO 1760: Physical Scientists
## Job051800 TO 1860: Social Scientists and Related Workers
## Job051900 TO 1960: Life, Physical and Social Science Technicians
## Job052000 TO 2060: Counselors, Sociala and Religious Workers
## Job052100 TO 2150: Lawyers, Judges and Legal Support Workers
## Job052200 TO 2340: Teachers
## Job052400 TO 2550: Education, Training and Library Workers
## Job052600 TO 2760: Entertainers and Performers, Sports and Related Workers *
## Job052800 TO 2960: Media and Communications Workers
## Job053000 TO 3260: Health Diagnosing and Treating Practitioners
## Job053300 TO 3650: Health Care Technical and Support Occupations
## Job053700 TO 3950: Protective Service Occupations **
## Job054000 TO 4160: Food Preparation and Serving Related Occupations
## Job054200 TO 4250: Cleaning and Building Service Occupations
## Job054300 TO 4430: Entertainment Attendants and Related Workers
## Job054500 TO 4650: Personal Care and Service Workers
## Job054700 TO 4960: Sales and Related Workers
## Job05500 TO 950: Management Related Occupations *
## Job055000 TO 5930: Office and Administrative Support Workers *
## Job056000 TO 6130: Farming, Fishing and Forestry Occupations
## Job056200 TO 6940: Construction Trade and Extraction Workers
## Job057000 TO 7620: Installation, Maintenance and Repairs Workers
## Job057700 TO 7750: Production and Operating Workers
## Job057800 TO 7850: Food Preparation Occupations
## Job057900 TO 8960: Setters, Operators and Tenders
## Job059000 TO 9750: Transportation and Material Moving Workers
## Job059990: Uncodeable
## bmi
## intelligence **
## FamilyIncome78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.22 on 2086 degrees of freedom
## Multiple R-squared: 0.13, Adjusted R-squared: 0.115
## F-statistic: 8.24 on 38 and 2086 DF, p-value: <2e-16
fit15 seems to be the best model so far
- How did you land on this model? Run a model diagnosis to see if the linear model assumptions are reasonably met.
We are choosing fit15 as our best model given the
relatively higher R^2 value and features spanning home envrionment and
personal demographics which have a significant linear relationship with
PC1.
Residual Plot
There appear to be heteroscedasticity within the model as the variance is not equally distributed across all values of x. In other words, the variance does not appear to be constant, which does not support a linear model assumption.
Check for Normality
The points in the qqplot deviate significantly from the reference line, indicating the data may not be entirely normally distributed.
Taken together, we do not believe there is enough evidence that the assumptios of the linear model are met.
- Write a summary of your findings. In particular, explain what and how the variables in the model affect one's self-esteem.
Given the opposing outcomes of the linear model diagnosis which indicate the linear model assumptions may not be completely met, interpretation of this analysis should proceed with caution, as a linear model may not be the best model for predicting PC1 scores of self esteem. With that in mind, featyures such as male gender, income, education, intelligence, family income, and certain leadership-related job positions seem to have a positive correlation with higher self esteem scores. Features such as BMI and some STEM-related fields appear to have a negative correlation with high self esteem scores. Most of the persional background features such as education, gender, and income appear to have the stronge linear realtionships with self esteem scores. Together, the p-values and \(\beta\) coefficients give a complementary picture of the magnitude and directions of linear relationships between these features and show that personal perceptions may be more shaped by personal demographics than family environment.
The Cancer Genome Atlas (TCGA), a landmark cancer genomics program by National Cancer Institute (NCI), molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types. The genome data is open to public from the Genomic Data Commons Data Portal (GDC).
In this study, we focus on 4 sub-types of breast cancer (BRCA): basal-like (basal), Luminal A-like (lumA), Luminal B-like (lumB), HER2-enriched. The sub-type is based on PAM50, a clinical-grade luminal-basal classifier.
We will try to use mRNA expression data alone without the labels to classify 4 sub-types. Classification without labels or prediction without outcomes is called unsupervised learning. We will use K-means and spectrum clustering to cluster the mRNA data and see whether the sub-type can be separated through mRNA data.
We first read the data using data.table::fread() which
is a faster way to read in big data than read.csv().
Summary and transformation
How many patients are there in each sub-type?
Randomly pick 5 genes and plot the histogram by each sub-type.
Remove gene with zero count and no variability. Then apply logarithmic transform.
Apply kmeans on the transformed dataset with 4 centers and output
the discrepancy table between the real sub-type
brca_subtype and the cluster labels.
Spectrum clustering: to scale or not to scale?
Apply PCA on the centered and scaled dataset. How many PCs should
we use and why? You are encouraged to use
irlba::irlba().
Plot PC1 vs PC2 of the centered and scaled data and PC1 vs PC2 of
the centered but unscaled data side by side. Should we scale or not
scale for clustering process? Why? (Hint: to put plots side by side, use
gridExtra::grid.arrange() or
ggpubr::ggrrange() or egg::ggrrange() for
ggplots; use fig.show="hold" as chunk option for base
plots)
Spectrum clustering: center but do not scale the data
Use the first 4 PCs of the centered and unscaled data and apply kmeans. Find a reasonable number of clusters using within sum of squared with the elbow rule.
Choose an optimal cluster number and apply kmeans. Compare the real sub-type and the clustering label as follows: Plot scatter plot of PC1 vs PC2. Use point color to indicate the true cancer type and point shape to indicate the clustering label. Plot the kmeans centroids with black dots. Summarize how good is clustering results compared to the real sub-type.
Compare the clustering result from applying kmeans to the original data and the clustering result from applying kmeans to 4 PCs. Does PCA help in kmeans clustering? What might be the reasons if PCA helps?
Now we have an x patient with breast cancer but with unknown sub-type. We have this patient’s mRNA sequencing data. Project this x patient to the space of PC1 and PC2. (Hint: remember we remove some gene with no counts or no variablity, take log and centered) Plot this patient in the plot in iv) with a black dot. Calculate the Euclidean distance between this patient and each of centroid of the cluster. Can you tell which sub-type this patient might have?
This question utilizes the Auto dataset from ISLR. The
original dataset contains 408 observations about cars. It is similar to
the CARS dataset that we use in our lectures. To get the data, first
install the package ISLR. The Auto dataset should be loaded
automatically. We’ll use this dataset to practice the methods learn so
far. Original data source is here: https://archive.ics.uci.edu/ml/datasets/auto+mpg
Get familiar with this dataset first. Tip: you can use the command
?ISLR::Auto to view a description of the dataset.
Explore the data, with particular focus on pairwise plots and summary statistics. Briefly summarize your findings and any peculiarities in the data.
time have on MPG?Start with a simple regression of mpg
vs. year and report R’s summary output. Is
year a significant variable at the .05 level? State what
effect year has on mpg, if any, according to
this model.
Add horsepower on top of the variable
year to your linear model. Is year still a
significant variable at the .05 level? Give a precise interpretation of
the year’s effect found here.
The two 95% CI’s for the coefficient of year differ among (i) and (ii). How would you explain the difference to a non-statistician?
Create a model with interaction by fitting
lm(mpg ~ year * horsepower). Is the interaction effect
significant at .05 level? Explain the year effect (if any).
Remember that the same variable can play different roles! Take a
quick look at the variable cylinders, and try to use this
variable in the following analyses wisely. We all agree that a larger
number of cylinders will lower mpg. However, we can interpret
cylinders as either a continuous (numeric) variable or a
categorical variable.
Fit a model that treats cylinders as a
continuous/numeric variable. Is cylinders significant at
the 0.01 level? What effect does cylinders play in this
model?
Fit a model that treats cylinders as a
categorical/factor. Is cylinders significant at the .01
level? What is the effect of cylinders in this model?
Describe the cylinders effect over
mpg.
What are the fundamental differences between treating
cylinders as a continuous and categorical variable in your
models?
Can you test the null hypothesis: fit0: mpg is
linear in cylinders vs. fit1: mpg relates to
cylinders as a categorical variable at .01 level?
Final modeling question: we want to explore the effects of each feature as best as possible. You may explore interactions, feature transformations, higher order terms, or other strategies within reason. The model(s) should be as parsimonious (simple) as possible unless the gain in accuracy is significant from your point of view.
Describe the final model. Include diagnostic plots with particular focus on the model residuals and diagnoses.
Summarize the effects found.
Predict the mpg of the following car: A red car
built in the US in 1983 that is 180 inches long, has eight cylinders,
displaces 350 cu. inches, weighs 4000 pounds, and has a horsepower of
260. Also give a 95% CI for your prediction.
This exercise is designed to help you understand the linear model using simulations. In this exercise, we will generate \((x_i, y_i)\) pairs so that all linear model assumptions are met.
Presume that \(\mathbf{x}\) and \(\mathbf{y}\) are linearly related with a normal error \(\boldsymbol{\varepsilon}\) , such that \(\mathbf{y} = 1 + 1.2\mathbf{x} + \boldsymbol{\varepsilon}\). The standard deviation of the error \(\varepsilon_i\) is \(\sigma = 2\).
We can create a sample input vector (\(n = 40\)) for \(\mathbf{x}\) with the following code:
## [1] 0.0000 0.0256 0.0513 0.0769 0.1026 0.1282 0.1538 0.1795 0.2051 0.2308
## [11] 0.2564 0.2821 0.3077 0.3333 0.3590 0.3846 0.4103 0.4359 0.4615 0.4872
## [21] 0.5128 0.5385 0.5641 0.5897 0.6154 0.6410 0.6667 0.6923 0.7179 0.7436
## [31] 0.7692 0.7949 0.8205 0.8462 0.8718 0.8974 0.9231 0.9487 0.9744 1.0000
## [1] -0.253 1.398 -0.610 4.283 1.782 -0.487 2.159 2.692 2.398 0.666
## [11] 4.331 2.118 0.127 -3.029 3.681 1.372 1.460 3.411 3.196 2.772
## [21] 3.453 3.210 1.826 -2.271 2.978 1.657 1.488 -1.111 0.905 2.728
## [31] 4.640 1.748 2.760 1.908 -0.708 1.247 1.319 2.020 4.369 3.726
Create a corresponding output vector for \(\mathbf{y}\) according to the equation
given above. Use set.seed(1). Then, create a scatterplot
with \((x_i, y_i)\) pairs. Base R
plotting is acceptable, but if you can, please attempt to use
ggplot2 to create the plot. Make sure to have clear labels
and sensible titles on your plots.
## [1] 0.671 0.524 2.455 2.206 -0.254 -0.261 1.914 2.752 1.021 3.039
## [11] 2.104 0.114 2.051 -0.859 4.297 5.422 0.758 -0.565 2.693 1.315
## [21] 6.419 1.568 3.056 1.764 0.252 2.147 -1.810 4.762 2.168 6.238
## [31] 2.874 0.534 3.206 0.147 -0.461 2.660 1.221 2.141 2.318 1.021
## x y
## 1 0.0000 0.671
## 2 0.0256 0.524
## 3 0.0513 2.455
## 4 0.0769 2.206
## 5 0.1026 -0.254
## 6 0.1282 -0.261
## 7 0.1538 1.914
## 8 0.1795 2.752
## 9 0.2051 1.021
## 10 0.2308 3.039
## 11 0.2564 2.104
## 12 0.2821 0.114
## 13 0.3077 2.051
## 14 0.3333 -0.859
## 15 0.3590 4.297
## 16 0.3846 5.422
## 17 0.4103 0.758
## 18 0.4359 -0.565
## 19 0.4615 2.693
## 20 0.4872 1.315
## 21 0.5128 6.419
## 22 0.5385 1.568
## 23 0.5641 3.056
## 24 0.5897 1.764
## 25 0.6154 0.252
## 26 0.6410 2.147
## 27 0.6667 -1.810
## 28 0.6923 4.762
## 29 0.7179 2.168
## 30 0.7436 6.238
## 31 0.7692 2.874
## 32 0.7949 0.534
## 33 0.8205 3.206
## 34 0.8462 0.147
## 35 0.8718 -0.461
## 36 0.8974 2.660
## 37 0.9231 1.221
## 38 0.9487 2.141
## 39 0.9744 2.318
## 40 1.0000 1.021
lm() function. What are the true values of \(\boldsymbol{\beta}_0\) and \(\boldsymbol{\beta}_1\)? Do the estimates
look to be good?##
## Call:
## lm(formula = y ~ x, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.753 -1.218 0.104 0.880 4.570
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.532 0.580 2.64 0.012 *
## x 0.616 0.998 0.62 0.540
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.87 on 38 degrees of freedom
## Multiple R-squared: 0.00994, Adjusted R-squared: -0.0161
## F-statistic: 0.382 on 1 and 38 DF, p-value: 0.54
LS estimate of \(\boldsymbol{\beta}_0\) is 1.3 and \(\boldsymbol{\beta}_1\) is 0.906.
The true value of \(\boldsymbol{\beta}_0\) is 1
The true value of \(\boldsymbol{\beta}_1\) is 1.2
The estimates appears to be slightly off from the true values, but they are relatively close.
The RSE for this model is 1.79 which is pretty close to 2, slightly smaller.
The 95% confidence interval for \(\boldsymbol{\beta}_1\) is 0.906 +/- 0.959 = [-0.053, 1.865]. This confidence interval does capture the true \(\boldsymbol{\beta}_1\) of 1.2. If we had more samples (higher N), this confidence interval would become narrower and more precise around the true value.
## `geom_smooth()` using formula = 'y ~ x'
From this we can see the data is relatively normal and most likely follows linear model assumptions, but, conservatively, interpretation should proceed with caution as the distribution of the data may not be unimodal.
This part aims to help you understand the notion of sampling statistics and confidence intervals. Let’s concentrate on estimating the slope only.
Generate 100 samples of size \(n = 40\), and estimate the slope coefficient from each sample. We include some sample code below, which should guide you in setting up the simulation. Note: this code is easier to follow but suboptimal; see the appendix for a more optimal R-like way to run this simulation.
results$b1). Does the sampling distribution agree with
theory?## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.36 0.23 1.04 1.04 1.85 3.97
The mean of the sampling distribution of LS estimates for \(\boldsymbol{\beta}_1\) is 1.15, which is very close too the true \(\boldsymbol{\beta}_1\) of 1.2. Therefore, this sampling distribution shows strong support for the theory.
Given that this is the 95% confidnce interval, we expect that 95% of our 100 confidence intervals will capture the true value of \(\boldsymbol{\beta}_1\)
## [1] 0.96
Currently we see this is closer to 94%, which is relatively close. Whis intervals don’t cover the treu value?
Given the previously established 94% proportion of tru-value confidence intervals, the 6 red intervals out of hte 100 total intervals are an accurate deficiton of the 6% of intervals which do not contain the true value of \(\boldsymbol{\beta}_1\)